09.12.2012 Views

I__. - International Military Testing Association

I__. - International Military Testing Association

I__. - International Military Testing Association

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

32nd ANNUAL CONFERENCE<br />

OF THE<br />

MILITARY TESTING ASSOCIATION<br />

Orange Beach, Alabama<br />

5 - 9 November 1990<br />

Proceedings<br />

Hosted by the<br />

Naval Education anand Training<br />

Program Management Support Activity<br />

.


32nd A N N U A L C O N F E R E N C E O F T H E<br />

M I L I T A R Y T E S T I N G A S S O C I A T I O N<br />

H o s t e d b y the<br />

N a v a l E d u c a t i o n a n d T r a i n i n g P r o g r a m<br />

M a n a g e m e n t S u p p o r t A c t i v i t y<br />

O r a n g e Beac!h, A l a b a m a<br />

5 - 9 N o v e m b e r 1 9 9 0<br />

.<br />

.


32nd Annual Conferenae of the <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong><br />

Chairperson ,J<br />

Conference Coordinator<br />

Hosted by the<br />

Naval Education and Training Program<br />

Management Support Activity<br />

Orange Beach, Alabama<br />

5 9 November 1990<br />

Conference Committee<br />

Chair, Program and Publications Subcommittee<br />

Chair, Facilities Subcommittee<br />

Chair, Registration Subcommittee<br />

Chair, Social Subcommittee<br />

Chair, Public Relations Subcommittee<br />

Chair, Memento Subcommittee<br />

Chair, Finance Subcommittee<br />

Site Coordinator<br />

i<br />

Commander Mary A. Adams<br />

Mr. Robert King<br />

Mr, Donald Lupone<br />

Mr. William Adams<br />

Mr. Richard Lopez<br />

Mr. Robert Pallme<br />

Mr, David Slover<br />

Mr. Dean McCallum<br />

LT Gary L. Waters<br />

Dr. Charles Hesse


Acknowledgement8<br />

The 8UCe86 O f the MTA Conference can be attributed to the<br />

dedication of individual8 who worked many hours. The MTA<br />

Conference Committee members express their appreciation to the<br />

following people for their contribution8 to the Conference:<br />

Pailities<br />

Mr. William Adams (Chair)<br />

DMC Charles Alvare<br />

Ms. Sharon Benton<br />

CMCS Thomas A. Browning :<br />

LICM Robert Carr<br />

Mr. Dale Eckard<br />

Mr. Al Farr<br />

PHC Carl Hinkle<br />

MS, Jackie Hufman<br />

CECS Billy F. Johnson<br />

CECS John A. Lanclos<br />

Ms. Fay Landrum<br />

Mr. Frank Strayer<br />

Finance<br />

LT Gary L. Waters<br />

Memento<br />

Mr. Dean McCalum (Chair)<br />

Ms. Catherine Warfield<br />

Presentation Facilitator8<br />

Mr. Gerald Murphy (Chair)<br />

Ms. Sharon Benton<br />

FTCS Robert Bloomquist<br />

AWCS David M. Devarney<br />

ETCS R. Elliott<br />

OTACS Robert H, Howe<br />

RPC Jeffery L. Krlngle<br />

GSCS Robert Kuzirlan<br />

RPC Frank Logan<br />

CWO Camilo D. Lomibao<br />

OTAC Mark A. Lowe<br />

JOC George Markfelder<br />

VNC Gail M. Ravy<br />

MRC Kenneth Shaw<br />

AKCS William Sims<br />

IV<br />

Proaram and Publications<br />

Mr. Donald Lupone (Chair) . -.<br />

Mr. W. N. Presley Jr.<br />

Ms. Wilma Scofield<br />

Ms. Joanne Vendetti<br />

Publa Relations<br />

Mr. David Slover (Chair)<br />

ETC Steve Anderson<br />

DMC Charles Alvare<br />

SMC Vie Barera<br />

Mr. Dave Bodin<br />

Maxwell Buchanan<br />

Mr. Norman Champagne<br />

Code 05 Department<br />

ATCS Joel Garner<br />

Mr. Frank Harwood<br />

MUCM David Johnson<br />

Mr. Don Phillips<br />

yN3 Mark Shinkle<br />

AXCS Gary Spoon<br />

Mr. Donald Wiggins<br />

Mr. Emery Williams<br />

Ms. Mary Wing<br />

Resistration<br />

Mr. Richard Lopez (Chair)<br />

Mr. Earl F. Roe<br />

Mr. Michael Abney<br />

ISC P. Buchan<br />

STGCS P. D, Craig<br />

Mr. Ronald Dougherty<br />

Ms. Brenda Frederick<br />

Ms. Susan Godwln<br />

Mr. Larry Goldlng<br />

STGC J. M. Griffin<br />

Ms. Debbie Halberg<br />

RMC C. I. Hannah<br />

FTCS R. Langley


Registration (Continued)<br />

RMC M, McKay<br />

AWC M. A. Morris<br />

OTMC W. E. Parsons<br />

AWC T. T. Pearson<br />

MS, Jane Reich<br />

Ms. Laura Roberts<br />

Ms. Anne Sayers<br />

ISCM T. Schroeder<br />

STGC E. C. Smith<br />

AWCM J. R. Thompson<br />

Ms. Marjorie Warsing<br />

STSC J. C. Whitaker<br />

Ms. Jo Ellen Wolf<br />

OTMCS R. A, Wood<br />

FTCM M. Young<br />

V<br />

Site Coordinator<br />

Dr. Charles Hesse<br />

Social<br />

Mr, Robert Palme (Chair)<br />

GMCS Ricardo Andres<br />

Ms. Ginger Andrews<br />

Ms. Nora Matos<br />

Mr. Joseph Neidlg<br />

Mr. Charles Warner


FORWORD<br />

These Proceeding6 of the 32nd Anual Conference of the <strong>Military</strong><br />

<strong>Testing</strong> ASOCiatiOn document the pr666ntatiOnS given at paper and<br />

panel 6e6iOn6 during the conference. The papers represent a<br />

broad range of topics by contributors from the military,<br />

industrial, and educational comunltles, both foreign and<br />

domestic. It should be noted that the papers reflect the<br />

opinion6 of the author6 and do not necessarily reflect the<br />

official policy of any Institution, government, or armed service.. .<br />

V i


TABLE OF CONTENTS<br />

1990 CONFERENCE COMMITTEE ..................................<br />

ACKNOWLEDGEMENTS ...........................................<br />

FOREWORD ...................................................<br />

TABLE OF CONTENTS ..........................................<br />

OPENING SESSION............................................<br />

PAPER PRESENTATIONS - MANPOWER<br />

101.<br />

102.<br />

103.<br />

104.<br />

105.<br />

106.<br />

107.<br />

108 1<br />

TRUSCOTT, S., The' Canadian Reserves: Current and<br />

Future Manpower..................,...,~..............<br />

MARTELL, LTC Kenneth A. and WINN, LTC Dennis H..<br />

Accession Dynamics...................................<br />

Not Presented.<br />

REEVES, Liz(N) D. T., Ethnic Participation in the<br />

Canadian Forces: Demographic Trends....... . . . . . . . . . .<br />

ELIG, Timothy.W., 1990 Army Career Satisfaction<br />

Survey...............................................<br />

DEMPSEY, J. R., HARRIS, D. A., and WATERS, A. K., The<br />

Use of Artificial Neural Networks in <strong>Military</strong><br />

Manpower Modeling.....,.....,..,...,~................<br />

EDWARDS, Jack E., ROSENFELD, Paul, and THOMAS,<br />

Patricia J., Hispanics in Navy's Blue-Collar Civilian<br />

Workforce: A Pilot Study............................<br />

Not Presented.<br />

PAPER PRESENTATIONS - OCCUPATIONAL ANALYSIS<br />

201. WALKER, C. L., Descriptors of Job Specialization<br />

Based on Job Knowledge Tests.........................<br />

202. RHEINSTEIN, Julie, O'LEARY, Brian S., and MCCAULEY,<br />

Jr., Donald E., Addressing the Issues of<br />

"Quantitative Overkill" in Job Analysis. . . . . . . . . . . . . .<br />

203. O'LEARY, Brian S., RHEINSTEIN, Julie, and MCCAULEY,<br />

Jr., Donald E., Developing Job Families Using<br />

Generalized Work Behaviors................ . . . . ,.. . ..,<br />

vii<br />

iii<br />

iV<br />

vi<br />

Vii<br />

xvi<br />

i<br />

6<br />

12<br />

19<br />

25<br />

3 1<br />

37<br />

51<br />

SR


204.<br />

205.<br />

206.<br />

207.<br />

208.<br />

209.<br />

210.<br />

211.<br />

212.<br />

213.<br />

214,<br />

O'LEARY, Brian S., RHEINSTEIN, Julie, and MCCAULEY<br />

Jr., Donald E., A Comparison of Holistic and<br />

Traditional Job-Analytic Methods................... . .<br />

HUDSPETH, Dr. DeLayne It., FAYFICH, Paul R., and<br />

PRICE, John S., Squadron Leader, Automating the<br />

Administration of USAF Occupational Surveys.. . . . . . . . .<br />

MENCHACA, Capt Jose, Jr., GUTHALS, 2Lt Jody A.,<br />

OLIVIER, Lou, and PFEIFFER, Glenda, MPT Enhancements<br />

to the Occupational Research Data Bank.. . . . . . . . . . . . . .<br />

PHALEN, William J., MITCHELL, Jimmy L., and HAND,<br />

Darryl K., ASCII CODAP: Progress Report on<br />

Applications of Advanced Analysis Software...........<br />

KLEIN, Paul, Professional Success of Former Officers<br />

in Civilian Occupations...........,..................<br />

FINLEY, Dorothy L. and YORK, William J., Jr., A<br />

<strong>Military</strong> Occupational Specialty (MOS) Research and<br />

Development Program: Goals and Status...............<br />

YORK, William J., Jr. and FINLEY, Dorothy L.,<br />

Application of the Job Ability Assessment System to<br />

Communication Systems Operators...,.. ,,.......,......<br />

ARNDT, K., Preferences for <strong>Military</strong> Assignments in<br />

German Conscripts...,...........,..,,,,,..,.........;<br />

SCHAMBACH, S. B., Aptitude-Oriented Replacement of<br />

Conscript Manpower in the German Bundeswehr.... . . . . . .<br />

VAUGHAN, David S., MITCHELL, Jimmy L., KNIGHT, J. R.,<br />

BENNETT, Winston R., and BUCKENMYER, David V.,<br />

Developlng a Training Time and Proficiency Model for<br />

Estimating Air Force Specialty Training Requirements<br />

of New Weapon Systems...................,.,,,,,., ,,.,.<br />

Not Presented.<br />

PAPER PRESENTATIONS - TRAINING<br />

301. MCCORMICK, D. L. and JONES, P. L., Evaluating<br />

Training Program Modifications.......................<br />

302. Not Presented.<br />

303. Not Presented.<br />

304. DIEHL, Grover E., The Effect of Reading Difficulty on<br />

Correspondence Course Performance. ,.,,....,. . . . . . . . . .<br />

V i i i<br />

64<br />

70<br />

76 -'<br />

82<br />

88<br />

94<br />

99<br />

104<br />

110<br />

116<br />

122<br />

128


305.<br />

306.<br />

307.<br />

308.<br />

309 *<br />

310.<br />

311.<br />

312.<br />

313.<br />

314.<br />

315.<br />

316.<br />

317.<br />

318.<br />

319.<br />

PARCHMAN, Steve W., ELLIS, John A., and MONTAGUE,<br />

William E., Navy Basic Electricity Theory Training:<br />

Past, Present, and Future............. . . . . . . . . . . . . . . .<br />

Not Presented.<br />

Not Presented.<br />

STEPHENSON, S. D. and STEPHENSON, J. A., Using Event<br />

History Techniques to Analyze Task Perishability: A<br />

Simulation, . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .., . . . . .<br />

STEPHENSON, S. D., A First Look at the Effect of<br />

Instructor Behavior in a Computer-Based Training<br />

Envlronment..........,,.......,,,,,..,..,............<br />

BESSEMER, D. W., Transfer of Training with Networked<br />

Simulators.............,..,,.,.,.,....,. ,.....,...,..<br />

Not Presented.<br />

DART, 1Lt Todd S., GUTHALS, 2Lt Jody A., and<br />

BERGQUIST, Maj Timothy M., Contingency Task Training<br />

Scenario Generator...............................,...<br />

MIRABELLA, Angelo, Cooperative Learning in the Army:<br />

Research and Application...... . . . . . . . . . . . . . . . . . . . . . . .<br />

EGGENBERGER. J. C., PhD, and CRAWFORD, R. L., PhD,<br />

Battle-Task/Battleboard Training Application Paradigm<br />

and Research Design.............,...,.............,..<br />

LICKTEIG, Carl W., KOGER, Major Milton E., and<br />

HESLIN, Captain Thomas F., Combat Vehicle Commander's<br />

Situational Awareness: Assessment Techniques........<br />

FEHLER, F., An Aviation Psychological System for<br />

Helicopter Pilot Selection and Training.. . . . . . . . ...,.<br />

SPECTOR, J. M. and MURAIDA. D. J., Analyzing User<br />

Interaction with Instructional Design Software. . . . . . .<br />

PFEIFFER, M. G. and EVANS, R. M., Forecasting<br />

Training Effectiveness (FORTE)... . . . . . . . . . ,., ..‘.....<br />

PHELPS, Dr. Ruth H. and ASHWORTH, MAJ Robert L., Jr.,<br />

Cost-Effectiveness of Home Study Using Asynchronous<br />

Computer Conferencing for Reserve Component<br />

Training.............................................<br />

132<br />

138<br />

144<br />

150<br />

156<br />

162<br />

161<br />

174<br />

180<br />

185<br />

191<br />

199


- TESTING<br />

401. RUDOLPH, Sandra A., Test Design and Minimum Cutoff<br />

Scores...........,......,,.,,.,,.,,,,,.......,. ,..... 204<br />

402. KOBRICK, J. L., JOHNSON, R. F., and MCMENEMY, D. J.,<br />

Subjective and Cognitive Reactions to Atropine/2-PAM,<br />

Heat, and BDU/MOPP-IV................,.,.,........... 210<br />

403. LESCREVE, F. and SLOWACK, W., Guts: A Belgian Gunner<br />

<strong>Testing</strong> System.......................,............... '216 .<br />

404. Not Presented.<br />

405. Not Presented.<br />

J.<br />

406. KENNEDY, R. S., DUNLAP, W. P., FOWLKES, J. E., and<br />

TURNAGE, J. J., Characterizing Responses to Stress<br />

Utilizing Dose Equivalency Methodology............... 220<br />

407. Not Presented.<br />

408. ARABIAN, Jane M. and SCHWARTZ, Amy C., Job Sets for<br />

Efficiency In Recruiting and Training (JSERT)... ‘., . . 226<br />

409. THAIN, John W., Development of a New Language<br />

Aptitude Battery........................,............ 231<br />

410. WILLIAMS, J. E., STANLEY, P. P., and PERRY, C. M.,<br />

Implementation of Content Validity Ratings In Air<br />

Force Promotion Test Construction................. . . . 235<br />

411. JEZIOR, B, A., POPPER, R., LESHER, L. L., GREENE, C.<br />

A., and INCE, V., Interpreting Rating Scale Results:<br />

What Does a Mean Mean?.....................,... ,,.,,. 241<br />

412. SANDS, W. A., Joint-Service Computerized Aptitude<br />

Testlng............................,,................<br />

413. Not Presented.<br />

414. O'BRIEN, L. H., Assessment of Aptitude Requirements<br />

for New or Modified Systems................,.,.,...,, 251<br />

415. Presented in Symposium 803D.<br />

416. SCHWARTZ, Amy C. and SILVA, Jay M., The Practical<br />

Impact of Selecting Tow Gunners with a Psychomotor<br />

Test..~.,.,.........,,,....,,..,,......,..............<br />

417. BRADLEY, Capt. J. P., Validation of a Naval Officer<br />

Selection Board..................,.,...............,. 262<br />

X<br />

245<br />

256


418. Not Presented.<br />

419. HANSON, Mary Ann, and BORMAN, Walter C.,<br />

A Situational Judgment Test of Supervisory<br />

Knowledge in'the U..S. Army............................ 268<br />

420. Presented in Symposium 803B.<br />

421. Not Presented.<br />

422. BUCK, Lawrence S., Context Effects on Multiple-<br />

Choice Test Performance... . . . . . . . . . . . . . . . . . . . . . . . . . . . 214<br />

423. SALTER, MAJ Charles A., LESTER, Laurie S., LUTHER,<br />

Susan M., and LUISI, Theresa A., Dietary Effects on<br />

Test Performance. ..,.. .,,,....,. . . . . . . . . . . . . . . . . . . . . . 280<br />

424. MAEL, F. A., What Makes Biodata Biodata?............. 286<br />

425. VAN HEMEL, S., ALLEY, F., BAKER, H., and SWIRSKI, L.,<br />

Job Sample Test for Navy Fire Controlman,............ 292<br />

426. BAKER, H., SANDS, M., and SPOKANE, A., ASVIP: An<br />

Interest Inventory Using Combined Armed Services<br />

Jobs................................................, 298<br />

427. SPIER, M., DHAMMANUNGUNE, S., BAKER, H., and SWIRSKI,<br />

L * , Predicting Performance with Biodata.............. 304<br />

428. ALBERT, W. G. and PHALEN, W. J., Development of<br />

Equations for Predicting <strong>Testing</strong> Importance of<br />

Tasks.......................................,....,... 310<br />

429. DITTMAR, Martin J., HAND, Darryl K., PHALEN. William<br />

J and ALBERT, W. G., Estimating <strong>Testing</strong> Importance<br />

oftTasks by Direct Task Factor Weighting..... . . . . . . . . 316<br />

430. Not Presented.<br />

431. BRADY, Elizabeth J. and RUMSEY, Michael G., Upper<br />

Body Strength and Performance in Army Enlisted MOS... 322<br />

432. PALMER, D. R., WHITE, L. A., and YOUNG, M, C.,<br />

Response Distortion on the Adaptability Screening<br />

Profile (ASP)..,....,............,...........,..,.... 328<br />

433. BANDERET, L. E., SHUKITT-HALE, B. L., LIEBERMAN, H.<br />

R SIMPSON, LTC R. L., and PEREZ, CPT P. J..<br />

Psychometric Properties of a Number Comparison Task:<br />

Medium and Format Effects.........................,.. 334<br />

Xi


434.<br />

435.<br />

436.<br />

437.<br />

438.<br />

439,<br />

440.<br />

441.<br />

442.<br />

443.<br />

444.<br />

445.<br />

446.<br />

BANDERET, L. E., O'MARA, M., PIMENTAL, N. A., RILEY,<br />

SGT R. H., DAUPHINEE, SSG D. T., WITT, SSG C. E., and<br />

TOYOTA, SGT R. M., Subjective States Questionnaire:<br />

Perceived Well-Being and Functional Capacity.........<br />

ROMAGLIA, CIC Diane L. and SKINNER, Jacobina,<br />

Validity of Grade Point Average: Does the College<br />

Make a Difference?...................,...,.,,,,,.,,,.<br />

Not Presented.<br />

HANSEN, H. D., Flight Psychological Selection<br />

System - FPS-80: A New Approach to the Selection<br />

of Aircrew Personnel,.,.......,.,,......... . . . . . . . . . .<br />

MELTER, A. H. and MENTGES, W., Leadership in Aptitude<br />

Tests and in Real-Life Situations.................. . .<br />

PUTZ-OSTERLOH, W., Computer-based Assessment of<br />

Strategies in Dynamic Decision Making........... . . . . .<br />

RODEL, G., The "Information and Counseling Action"<br />

(IBA) of the German Navy......~............,.........<br />

CONNER, Dr. Harry B., Troubleshooting Assessment and<br />

Enhancement (TAE) Program: Test and Evaluation<br />

Results.......................,.........,.............<br />

BUSCIGLIO, Henry H., Incrementing ASVAB Validity with<br />

Spatial and Perceptual-Psychomotor Tests......... . . . .<br />

RUSHANO, T. M., Item Content Validity: Its<br />

Relationship with Item Discrimination and<br />

Difficulty...............,,..,..,......,.....,.......<br />

FIEDLER, E., The Air Force Medical Evaluation Test,<br />

Basic <strong>Military</strong> Training, and Character of<br />

Separation......,,........ . . . . . . . . . . . . . . . . . . . . . . . . . . . 392<br />

TRENT, T., QUENETTE, M. A., and LAMBS, G. J.,<br />

Implementation of the Adaptability Screening Profile<br />

(ASP)..........,.........,..........,.....,..,..,..,.<br />

MCGEE, Steve D., Utilization of Word<br />

Processors/Computers vs Typewriter for U.S. Navy<br />

Typing Performance Tests.....................~.~.....<br />

PAPER PRESENTATIONS - HUMAN FACTORS<br />

501. THARION, W. J., MARLOWE, B. E., KITTREDGE, R., HOYT,<br />

R and CYMERMAN, A. Acute High Altitude Exposure<br />

and Exercise Decrease Marksmanship Accuracy.. . . . . . . . . 408<br />

X-ii<br />

339<br />

345<br />

351<br />

357<br />

362<br />

368<br />

372<br />

380<br />

386<br />

398<br />

404<br />

_


502. Not Presented.<br />

503. COLLINS, Dennis D., Human Performance Data for Combat<br />

Models.........,......,.....,.............,.,,.,.,..,<br />

504. TLJRNAGE, Janet J., KENNEDY, Robert S., and JONES,<br />

Marshall B., Trading Off Performance, Training, and<br />

Equipment Factors to Achieve Similar Performance.....<br />

505. BAYES, Andrew H., Final Report, Computer Assisted<br />

Guidance Information Systems.. . . . . . . . . . . . . . . . . . . . . . . .<br />

PAPER PRESENTATIONS - LEADERSHIP<br />

601.<br />

602.<br />

603.<br />

604.<br />

605.<br />

606.<br />

607.<br />

608.<br />

609.<br />

610.<br />

Not Presented.<br />

ALDERKS, Cathie E,, Vertical Cohesion Patterns in<br />

Light Infantry Units....;......... . . . . . . . . . . . . . . . . . . .<br />

LINDSAY, Twila J. and SIEBOLD, Guy L., The Use of<br />

Incentives in Light Infantry Units... . . . . . . . . . . . . . . . .<br />

SIEBOLD, Guy L., Cohesion in Context., . . . . . . . . . . . . . . .<br />

WALDKOETTER, R. O., WHITE, W, R., Sr., and VANDIVIER,<br />

P. L., Evaluation of the Army's Finance Support<br />

Command Organizational Concept.......................<br />

STEINBERG, Alma G. and LEAMAN, Julia A., Leader<br />

Initiative: From Doctrine to Practice...............<br />

Not Presented.<br />

Not Presented.<br />

Not Presented.<br />

CLARK, Herbert J,, Starting a TQM Program in an R&D<br />

Organization.............,....,,.....................<br />

PAPER PRESENTATIONS - MISCELLANEOUS TOPICS<br />

701. Not Presented.<br />

702. ROOZENDAAL, Col. G. J. C., An Officer, a Social<br />

Scientist, (and possibly a gentleman) in the Royal<br />

Netherlands Army (RNLA)..... . . . . . . . . . . . . . . . . . . . . . . . . .<br />

703. GOLDBERG, Edith Lynne, SHEPOSH, John P., and<br />

SHETTEL-NEUBER, Joyce, Acceptance of Change: An<br />

Empirical Test of a Causal Model....... . . . . .,........<br />

Xiii<br />

414<br />

419<br />

425<br />

432<br />

438<br />

444<br />

450<br />

455<br />

460<br />

466<br />

474


PAPER PRESENTATIONS - SYMPOSIA (ALL CATEGORIES)<br />

801. TWEEDDALE, J. W., Symposium: The Naval Reserve<br />

Officers Training Corps (NROTC) Scholarship<br />

Selection System................,.............. . . . . . 480<br />

801A. TWEEDDALE, J. W., Research Needs for Naval Reserve<br />

Officers Training Corps Scholarship Selection... . . . . 480<br />

801B. HAWKINS, R. B., Gathering and Using Naval Reserve<br />

Officers Training Corps Scholarship Information,. . . . 481<br />

8OlC. EDWARDS, Jack E.9, BURCH, Regina L., and ABRAHAMS,<br />

Norman M., Validation of the Naval Reserve Officers<br />

Training Corps Quality Index..................,..... 486<br />

801D. BORMAN, Walter C., OWENS-KLJRTZ, C. K., and RUSSELL,<br />

T. L., Development and Implementation of a<br />

Structured Interview Program for NROTC Selection,... 492<br />

801E. HANSON, Mary Ann, PAULLIN, Cheryl, and BORMAN,<br />

Walter C., Development of an Experimental<br />

Biodata/Temperament Inventory for NROTC Selection.. . 498<br />

802. 802 through 8025 Not Presented.<br />

803. BORMAN, W., BOSSHARDT, M., DUBOIS, D., HOUSTON, J.,<br />

CRAWFORD, K., WISKOFF, M., ZIMMERMAN, R., and<br />

SHERMAN, F., Psychological Applications to Ensuring<br />

Personnel Security: A Symposium....,.... . . , ,..., . . . 504<br />

803A. DUBOIS, D., BOSSHARDT, M., and WISKOFF, M., The<br />

Investigative Interview: A Review of Practice and<br />

Research...................,...........,.,..,.......<br />

803B. ZIMMERMAN, R. A. and WISKOFF, M. F., Utility of a<br />

Screening Questionnaire for Sensitive <strong>Military</strong><br />

Occupations....... .,. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511<br />

803C. BOSSHARDT, M., DUBOIS, D., and CRAWFORD, K.,<br />

Continuing Assessment of Cleared Personnel In the<br />

<strong>Military</strong> Services.....................,.......,..,.. 516<br />

803D. HOUSTON, J., WISKOFF, M. and SHERMAN, F.,<br />

A Measure of Behavioral Reliability for Marine<br />

Security Guards.... . . . . . . . . . . . . . . . . . . .,......,.,.... 522<br />

804. HARRIS, J. H., CAMPBELL, Charlotte H., and CAMPBELL,<br />

Roy C., Symposium: Job Performance <strong>Testing</strong> for<br />

Enlisted Personnel..... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528<br />

xiv<br />

505


804A.<br />

804B.<br />

ao4c.<br />

804D.<br />

805.<br />

DOYLE, Earl L. and CAMPBELL, R. C., Navy: Hands-On<br />

and Knowledge Tests for the Navy Radioman........,.. 529<br />

EXNER, Maj P. J,, CRAFTS, J. L., FELKER, D. B.,<br />

BOWLER, E. C., and MAYBERRY, P. W., Interrater<br />

Reliability as an Indicator of HOPT Quality Control<br />

Effectlveness.............,...,....,,...,....,,,,,,, 535<br />

Not Presented.<br />

CAMPBELL, Charlotte H. and CAMPBELL, Roy C., Army:<br />

Job Performance Measures for Non-Commissioned<br />

Offlcers...,........,.~,.,.,.,,..................... 541<br />

BROOKS, J. T., CkLE, W. J., HARRIS, J. C., STANLEY<br />

II, P. P., and TARTELL, J, S., The USAF Occupational<br />

Measurement Squadron: Its Organization, Products,<br />

and Impact......,.,..,...........................,., 547<br />

PAPER PRESENTATIONS - VENDOR PRESENTATIONS<br />

901, BROWN, Gary C., The Examiner..,.......:............. 553<br />

CONFERENCE INFORMATION<br />

MINUTES OF THE STEERING COMMITTEE MEETING.. . . . . . . . . . ..a.... 560<br />

LIST OF STEERING COMMITTEE MEETING ATTENDEES ............... 562<br />

AGENCIES REPRESENTED BY MEMBERSHIP ON THE<br />

MTA STEERING COMMITTEE................................. 563<br />

BY-LAWS OF THE MILITARY TESTING ASSOCIATION ................ 568<br />

LIST OF CONFERENCE REGISTRANTS ............................. 573<br />

INDEX OF AUTHORS.....,............,........................ 585<br />

XV<br />

_.


32nd Annual Conference of the <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong><br />

Orange Beaah, Alabama<br />

5 November 1990<br />

,OPENING SESSION<br />

Opening Remarks: Commander Mary A. Adams, Head, Naval<br />

Advancement Center Department, Naval Education and Training<br />

Program Management Support Activity; Pensacola, Florida. . . .<br />

Welaome: Mr. George W. Tate, Executive Vice President, Orange<br />

Beach Chamber of Commerce; Orange Beach, Alabama<br />

*<br />

Keynote Addrees: Lt General Donald W. Jones, Deputy Assistant<br />

Secretary of Defense (<strong>Military</strong> Manpower and Personnel Policy)<br />

xvi


'I'li~; CANADIAN RESERVES: CURRENT X1JD FUTURE MANPOWER*<br />

Susan R. Truscott<br />

Directorate of Social and Economic Analysis<br />

Operational Research and Analysis Establishment<br />

Department of National Defence<br />

Ottawa, Canada<br />

BACKGROUND<br />

In 1587, the Canadian White Paper on Defence outlined<br />

numerous policy changes for the Canadian Forces. One of these<br />

was the Total Force Concept. In brief, it stated that the<br />

distinction between the regular force and the reserves is to be<br />

reduced and the responsibility for national defence is to be<br />

shared. To fulfil its commitments, Canada must look to a<br />

peacetime structure that can be rapidly and effectively<br />

augmented by a trained reserve force composed of part-time<br />

members. A mixed operational force is to be formed, where<br />

regular and reserve force personnel are integrated in units.<br />

The ratio of full-time to part-time personnel will be dependent<br />

on the nature and requirements of the unit.<br />

Currently, regular force members outnumber reservists by a<br />

ratio of more than three to one. To assume a greater role in<br />

the defence of Canada, the reserves are to be revitalized and<br />

expanded. The recruitment of a large number of reservists, and<br />

perhaps different types of reservists, over the next decade<br />

will present a challenge to the reserves, in light of current<br />

socio-demographic and economic trends such as a declining youth<br />

population and broader employment opportunities. Recruiting the<br />

required number of reservists may necessitate new initiatives -<br />

for example, the widening of the traditional recruiting<br />

population and the engagement of new recruiting and advertising<br />

strategies.<br />

Several studies have been undertaken to provide data on a<br />

force that has, at least from a research point of view, been<br />

largely ignored in recent years. The focus of this paper is on<br />

a three phase study of the Primary Reserves, conducted by the<br />

Directorate of Social and Economic Analysis. During Phase One,<br />

qualitative information was collected through interviews with<br />

key reserve personnel. In Phase TWO, a survey was administered<br />

to a random sample of reservists to identify the<br />

characteristics, attitudes and values of reservists. The study<br />

also focused on retention and the internal organization of the<br />

reserves. A national attitude survey of 6000 Canadians was<br />

conducted, in Phase Three, to assess knowledge of the reserves,<br />

attitudes toward the reserves and the propensity of Canadians<br />

to join the reserves. Preliminary results of this study, and<br />

' The views and opinions expressed in this paper are<br />

those of the author and not necessarily those of the<br />

Department of National Defence.<br />

1


their implications in light of socio-demographic trends in the<br />

Canadian population and organizational changes planned for the<br />

reserves, are highlighted in this paper. A profile of<br />

reservists is presented first. This is followed by data on the<br />

Canadian public's knowledge of, and attitudes toward, the<br />

reserves.<br />

FINDINGS<br />

A. SURVEY OF RESERVISTS<br />

The reserves are dominated by young, single males. At the<br />

time of the survey, thirty-one percent of the reservists were -.<br />

students and 18% were unemployed. Together, these two groups<br />

comprise almost one-half of the reserves. Of the remaining 51%<br />

who were employed, about 24% were Class B or C Reservists, and<br />

thus in continuous ful!l-time employment with the military. In<br />

comparison to 1976, there has been only a modest change in the<br />

percentage who are employed. However, there has been a<br />

substantial increase in the percentage who are unemployed and a<br />

decrease in the percentage who are students. This is an<br />

indication of how closely tied reserve recruitment and<br />

retention is to the employment situation in the Canadian<br />

economy, and in particular the regional economy - relationships<br />

well documented in regular force research.<br />

The reserves are attracting and/or retaining more personnel<br />

who have or are achieving post-secondary education. Of the<br />

reservists who were attending school in 1976, 66% were in high<br />

school, 17% were in college and 16% were enroled in university.<br />

The recent survey indicated that 50% of the students were in<br />

high school, 20% were in college and 30% were in university.<br />

The increase in reservists with, or attaining post-secondary<br />

education reflects the greater emphasis on education in<br />

society, the greater technical demands in some areas of the<br />

forces, and the use of reserve activity to subsidize<br />

post-secondary education costs.<br />

Many reservists have prior experience with the military -<br />

forty percent had been members of the cadets and 20% had<br />

previously been in another reserve unit. Ten percent of primary<br />

reservists had served in the 'regular force. Ex-service members<br />

provide expertise and training that is difficult; if not<br />

impossible, to recruit from the civilian work force. There are<br />

some 65,000 ex-regular force members who would be suitable for,<br />

but are not members of the reserves (Bossenmaier, 1987).<br />

Our study indicated that word of mouth was the most common<br />

first source of informatiori on the reserves. Only 7% of<br />

reservists reported that formal advertising had provided their<br />

first. information on the reserves. National advertising<br />

campaigns have not been the focus of reserve recruiting in the<br />

past, however, they are an effective means of directing a<br />

specific message to a target population. Indeed they may<br />

provide a very functional mechanism to enhance public awareness<br />

2


of ‘;llC 1; e f-;


proportion of the population benefit from receiving some<br />

military training and experience and be of use to the military<br />

if mobilization is required. In addition, they contribute to<br />

the "Defence Community" in Canada; that is, sub-groups of<br />

Canadians with military knowledge and experience and an<br />

understanding of the Defence mandate. The reserves have both<br />

organizational and societal responsibilities, thus public<br />

relations campaigns, should be designed to appeal to those who<br />

view the reserves as a part-time job and to those who view it<br />

as a professional calling.<br />

Based on estimates from Statistics Canada, the general -.<br />

population will continue to age due to birth rates below<br />

replacement levels. It is expected, however, that the growth<br />

rate in the countqy will be maintained through increased<br />

immigration. These projections carry major implications for the<br />

reserves. A decline in the youth population means that the<br />

traditional recruiting base will shrink and that the reserves<br />

may increasingly have to rely on women, older persons, the<br />

employed, ex-regular force members and first generation<br />

Canadians to fill its ranks. These are all subgroups of the<br />

population currently under-represented in the reserves. By<br />

extending its age restrictions, older members in certain trades<br />

may be encouraged to stay.<br />

B. NATIONAL ATTITUDE SURVEY<br />

The National Attitude Survey was administered to 6000<br />

Canadians between the ages of 15 and 50 to assess the level of<br />

awareness of the reserves, attitudes toward the reserves, and<br />

the propensity of various sub-groups of Canadians to join the<br />

reserves. Eighty percent of those interviewed had heard of the<br />

reserves, but few admit to having a great deal of awareness of<br />

the reserves or their activities. In fact, just over 40% of<br />

those interviewed said they were not at all, or not very aware<br />

of the reserves, or their activities.<br />

Many Canadians reported that word of mouth was their most<br />

significant source of information on the reserves. Forty-five<br />

percent of Canadians reported that friends, family members,<br />

relatives or teachers were their main source for information<br />

about the reserves. The media were reported as the most<br />

significant sources for 42% of those interviewed; thus far more<br />

important than was the case for reserve members.<br />

Twenty percent of those interviewed, and 25% of those 15-<br />

24 years of age, had considered joining the reserves within the<br />

previous year. Addressing the future, about 5% of the sample<br />

said that they were somewhat to very likely to considering<br />

joining the reserves. This was the case for 10% of those aged<br />

15-24. Interest in joining the reserves was highest among 15-24<br />

year olds in the Atlantic provinces and Quebec. Those who<br />

indicated a willingness to join the reserves, most frequently<br />

responded with patriotic reasons. Monetary/work experience<br />

4


easons, followed by social reasons were a's0 common responses.<br />

Older persons were more likely to report patriotic reasons for<br />

interest in the reserves, while pragmatic reasons were more<br />

common among zhe young. A lack of interest, family, school and<br />

work responsibilities, and age were the most common reasons<br />

provided by those not interested in joining the reserves.<br />

SUMMARY<br />

In summary, the reserves are still very dependent on young<br />

Canadians to fill its ranks. While there are benefits in<br />

recruiting from this sub-group of the population, there are<br />

also considerable drawbacks. The historical exclusion of<br />

students from mobilization, high attrition rates, and the<br />

resulting continual training requirements are examples. With<br />

little doubt, attrition rates will continue to remain high<br />

among the reserves. Principally, the factors that draw<br />

reservists out of the reserves are related to their age and<br />

stage of life. This suggests that attrition rates may be<br />

improved by attracting a different type of individual to the<br />

reserves, such as older persons, ex-regular force members or<br />

the civilian employed. Further efforts will be made to explore<br />

attitudes toward the reserves, and the propensity to join,<br />

across sub-groups of the population currently under-represented<br />

in the reserves, as well as factors which may currently limit<br />

or restrict participation. Reserve manpower and related manning<br />

issues should be reviewed in light of the new policy role for<br />

the reserves.<br />

REFERENCES<br />

Bossenmaier, G. (1987). Potential Manpower Resources For<br />

Mobilization Part 1 (ORAE Project Report No. PR434).<br />

Ottawa, 0ntario:Directorate of Manpower Analysis.<br />

Goodfellow, T.H. (1976). Reserve Force Survey (ORAE Project<br />

Report No. PR62). Ottawa, Ontario: Directorate of Social<br />

and Economic Analysis.<br />

Popoff, T. and Truscott, S. (1987). A Sociological Study of<br />

the Reserves: Phase Two Trends and Implications for the<br />

Future (ORAE Project Report No. PR440). Ottawa, Ontario:<br />

Directorate of Social and Economic Analysis.<br />

Sinaiko, W. H. (1985). Part-Time Soldiers, Sailors and Airmen,<br />

Reserve Force Manpower in Australia, Canada, New Zealand,<br />

the U. K. and the U.S. (Technical Panel 3 Report (UTP-3)).<br />

Washington, DC: The * Technical Cooperation Program,<br />

Subgroup U.<br />

Truscott, S. (1987). A Socioloqical Study of the Reserves:<br />

Phase Two Summary of Research Findinqs (DSEA Staff Note<br />

No. 4/88). Ottawa, Ontario: Directorate of Social and<br />

Economic Analysis.<br />

5


Martell and LTC Dennis Winr?<br />

Department of the Army Headquarters,<br />

Office of the Deputy Chief of Staff for Personnel<br />

Pentagon, Arlington VA<br />

In order for the recruiting command (USAREC) to achieve its aggregate accession<br />

mission, there must also be specific MOS requirements to match the accession mission.<br />

The personnel command (PERSCOM) develops these MOS requirements. This paper<br />

will briefly describe the process and interactions among the various systems currently used<br />

to get the right number and”mix of soldiers to support the Army’s end strength<br />

requirements. This paper will define the challenges at the MOS and aggregate level of<br />

detail, exacerbated by the current changing environment, faced by the models and programs.<br />

timing (planning/forecasting), structure versus MOS program reductions, training capacity,<br />

and the effects of abrupt execution year accessions changes will be covered. A brief<br />

description of some specific accession policies will also be addressed. In addition, potential<br />

accommodations by the system to changing demands will be presented.<br />

MOS Level of Detail<br />

Under current procedures, all Army enlistees are assigned a job or a <strong>Military</strong><br />

Occupational Specialty (MOS) upon initially contracting. Thus, for Army Recruiting<br />

Command (USAREC) to achieve its aggregate accession mission, there must also be<br />

specific MOS requirements to match the accession mission. These MOS requirements by<br />

grade and quantity are developed centrally at the Army’s Personnel Command (PERSCOM)<br />

and in the aggregate are MOS programs.<br />

MOS requirements are identified using a planning model called MOSLS (MOS Level<br />

System). Inputs to MOSLS include: AAMMP (Active Army <strong>Military</strong> Manpower Program)<br />

developed from the ELIM-COMPLIP (Enlisted Loss Inventory-Computation of Manpower<br />

Using Linear Programming), projected authorization data from the PMAD (Personnel<br />

Management Authorization Document) or UAD (Updated Authorization Data), and<br />

inventory data from the EMF (Enlisted Master File).<br />

MOSLS then determihes the recommended MOS and grade mix for the MOS<br />

inventories, the gains to the MOS required to meet those inventories, and the training<br />

needed to support those gains. While some of the gains to the MOS will come through<br />

reenlistments and reclassifications, the majority will come through USAREC’s accession<br />

mission. These accessions are referred to as the MOS programs.<br />

I’apcr presented at the 32nd Annual Conference of the <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong>, Novcmbcr lWi),<br />


Training to support these programs is obtained through the Structure Manning Decision<br />

Review (SMDR) process. The SMDR is held annu.ally and allows each of the Army<br />

components (Active Army, Reserves, and National Guard) to express the training needed<br />

to support its MOS programs. These program requirements are evaluated against the<br />

training capacity in the Training and Doctrine Command (TRADOC) and, once approved,<br />

becomes TRADOC’s training mission. The approved training requirements are referred to<br />

as the Army Program for Individual Training (ARPRINT) and identifies for the individual<br />

TRADOC schools what their training mission is for the fiscal year.<br />

The TRADOC schools in turn develop individual class schedules to support their<br />

training mission. These class schedules are placed into the Army Training Requirements<br />

and Resources System (ATRRS) and ultimately into the automated accessioning system,<br />

REQUEST. The total of the “seats” in the classes for a particular MOS for the year is equal<br />

to the MOS programs developed through M0SL.S and approved in the SMDR. USAREC<br />

recruits against these classes and by filling the individual class “seats” also fills the annual<br />

MOS programs.<br />

The above process works well in a stable, predictable environment; however, as seen<br />

in recent years and especially now with the uncertainties of reducing the manpower in the<br />

Army or “downsizing”, the environment is anything but stable or certain. Discussed below<br />

are some of the problems encountered in managing accessions during these unique times.<br />

Timing. The SMDR works in the future. For example, the SMDR held in April and<br />

May 1990 built the FY93 training programs. Although FY92 was revalidated and FY93 was<br />

given a first look, the major work was on FY93 and it is that year’s training which will be<br />

approved in the ARPRINT in the summer of 1990. Projections in the best of circumstances<br />

are chancy; in a downsizing environment, the training that is “bought” and approved in<br />

FY90 may no longer reflect the requirements when FY93 finally arrives. Critical to the<br />

MOSLS process are the known and projected authorizations (PMAD) based on projected<br />

force structure. If the structure changes then the MOS requirements change, and thus the<br />

training requirements. While there are mechanisms to make adjustments to the training<br />

programs, because the SMDR is so closely tied to the budget and resourcing process,<br />

significant changes may not be satisfied in a timely manner.<br />

Structure Reductions. PERSCOM can adjust its MOS programs throughout the year<br />

to match the accession mission changes. Generally, these changes have been reductions in<br />

USAREC’s mission. While the Deputy Chief of Staff for Personnel (DCSPER) Accession<br />

Division can easily reduce the aggregate requirement, PERSCOM cannot reduce the<br />

supporting MOS programs without knowing what changes are being made in structure.<br />

Experience in the past and currently is that decisions on structure reductions lag behind<br />

decisions to reduce the accession missions. PERSCOM then is left with a couple of<br />

alternatives: make a best guess, in coordination with the Office of Deputy Chief of Staff<br />

for Operations (ODCSOPS), on what structure is coming out and adjust accordingly, [but<br />

the risk is that the guess will be wrong and irreversible decisions on MOS level accessions<br />

will have been made]; or leave the MOS programs untouched with the result being more<br />

availabIe program and training than there is accession mission to support. If the accession<br />

mission is 1,000 but there are 2,000 MOS program available, onIy 1,000 will be recruited.<br />

With that excess, we allow USAREC and the applicant to dictate what MOS programs are<br />

filled. The risk is that the wrong MOS programs will be filled.<br />

7


.<br />

Trainine Canacitv. The training that was approved in the SMDR was at the annual<br />

level. The individual school or installation must convert that requirement to class schedules<br />

and spread the requirement across the year. Generally, that spread will be made on a<br />

straight line ,consistent with the capacity in each course. For example, if the requirement<br />

for an MOS is 120 with a class optimum size of 20, the TRADOC school will likely<br />

schedule 6 classes conducted every other month. The concept is the same for basic training.<br />

While TRADOC does have some surge capacity, this straight line scheduling is a reflection<br />

of the fact that TRADOC is budgeted and manned on an annual basis. The physical plant<br />

(e.g. billets) and training equipment (e.g. simulators, tanks) may also dictate a straight line<br />

schedule with limited surge capability.<br />

TRADOC’s capability in recent years has been stretched to the limit. Faced with<br />

structure and budget cuts itself, TRADOC has recently indicated that it can no longer surge.<br />

They have requested HQDA’s support in effecting a more even flow into the training base<br />

with about 35% of the annual training capacity in the 4th quarter. While this should allow<br />

all three components (Active Army, Reserves, and National Guard) to still meet their<br />

missions while taking advantage of the prime summer recruiting months,the ability of the<br />

Army to slide the accession mission into the 4th Quarter to save <strong>Military</strong> Personnel Accoum<br />

(MPA) dollars will be restricted.<br />

Execution Year Changes. When the mission is slid to the 4th Quarter to save dollars.<br />

that shift can cause critical training seats to be missed which perhaps cannot be made up<br />

in the future, Also, because of the way MOS programs are counted, shifting the mission<br />

into August or September can take that mission “across the training year” line into the next<br />

training year. This is because MOS programs are based on the start date of the MOS<br />

producing course. Thus, an accession in an Advanced Individual Training (AIT) course for<br />

an MOS in August 1990, counts against the FY91 program because the MOS course will<br />

start in October, 8 weeks after the soldier accessed and entered the 8 week Basic Training<br />

(BT) course; for MOS with One Station Unit Training (OSUT), BT and AIT are merged<br />

so that when the soidier accesses and enters OSUT, he/she is starting their MOS producing<br />

course. The result is too low a mission to support the MOS programs in one year and<br />

excess mission for the available training in the subsequent “training program year”.<br />

PERSCOM must then take action to align the aggregate programs with the new mission by<br />

reducing the program level. These MOS program reductions can have the same adverse<br />

affect discussed above in Structure Reductions.<br />

Aggregate Level of Detail<br />

Input to the ELIM is made during the budget process and includes expected<br />

requirements for each of the months of the year/years being developed. Key considerations<br />

in this process are: recruiting capability, budget, and training capability.<br />

Recruiting capability is defined as what USAREC believes it can handle for each<br />

month, quarter, and annual mission. The aggregate numbers for each of these periods is<br />

developed from contract recruiting history and not necessarily accession history. The<br />

contract capability is developed by looking at what USAREC expects it can achieve in the<br />

VariOUS specific mission categories: combinations of gender, nonprior service/prior service,<br />

~l&$l SC11001 graduation status, and AFQT (the Armed Forces Qualification Test, part of the<br />

Armed Services Vocation Aptitude Battery or ASVAB). Contracting missions to the


USAREC commanders in the field takes a 6 month lead time and is, in the aggregate,<br />

influenced by the accession mission. However, the accession flow to the training base is<br />

controlled by USAREC headquarters and at the Department of the Army level. In other<br />

words, the recruiting commanders are not greatly influenced by gyrations in the monthly<br />

accession requirements.<br />

The influence of the market, incentives, recruiter tools, economic environment but<br />

especially recent recruiting history drive the aggregate numbers. USAREC deals with<br />

Reception Station Months (RSM) vice Calendar months. The RSM allows flexibility for<br />

shipping recruits to the 8 (7 after closing of FT. Bliss Reception Battalion) reception<br />

battalions. Conversion of RSM to Calendar numbers is not an accurate science since it is<br />

based on projected rate of shipping during each RSW (week). The conversion is important<br />

to identify the costs of each calendar month which is determined by manyear costs prorated<br />

by months of the year. Present dollar amount for one full year is $17,657. Thus if 5,000<br />

recruits are shipped (now referred to as accessions) in March, the sixth month of the fiscal<br />

year, this number equates to a cost of 5,000 x 6/12 or 2,500 X $17,657 and is $44,142,500<br />

in MPA cost. On the whole the conversion has been relatively accurate.<br />

Budgetary requirements to save <strong>Military</strong> Personnel Account (MPA) dollars tend to<br />

influence greatly the enlisted accession requirements. For instance, large amounts of MPA<br />

dollars can be saved simply by shifting accessions from the beginning of a year to the end<br />

of the year. Movement of accessions to the 4th quarter has been done for the last few<br />

years and is already included in the FY91 accession requirements by calendar month.<br />

Ironically, this shift to the 4th quarter does not cause USAREC as much concern as it does<br />

TRADOC. The major cuts to the Army budget in the form of dollars and endstrength have<br />

a significant impact on the adjustments to the accession programs for the current and future<br />

years.<br />

USAREC preference is for the following quarterly RSM breakdown:<br />

Quarter Percentage<br />

FIRST 21-23<br />

SECOND 22<br />

THIRD 19<br />

FOURTH 37<br />

First and fourth quarters are the best when considering the market. Some dynamics to<br />

consider here, which are hidden in the numbers, include the fact that high quality soldiers<br />

are easier to enlist with shorter Terms of Service (TOS), the size of next year’s mission<br />

influences recruit entry into the DEP (Delayed Entry Program), and the higher the<br />

aggregate quality goals the tougher the recruiting (unless resources for recruiting are<br />

commensurately increased).<br />

TRADOC prefers an even flow into training at the rate of 8.3% per month of input<br />

from USAREC. They have stated that the surge of training requirements in the fourth<br />

quarters are near impossible to handle. The contention is based on 4th quarter surge which<br />

includes Active, Reserve and National Guard. Acknowledging that even-flow is impossible,<br />

the fourth quarter surges, closing on 40%, are not resource supportable, especially with the<br />

first three quarters under capacity. It should be noted that the Army is looking at feasibility<br />

of clustering or combining certain MOS, eliminating MOS, consolidating training,<br />

eliminating BT/AIT/Reception Battalions (e.g., FT. Bliss) and this, although long term, will<br />

9


have a positive effect on some of the problems mentioned.<br />

Specific Accessions<br />

Females. The female enlisted accession floor was initially set in FY88. The most<br />

recent floor is based on slight growth, but growth nevertheless, of 0.2% to 0.5% percent<br />

each year in the female enlisted end strength compared to the total enlisted endstrength.<br />

This allows female endstrength to be reduced as the Arpy downsizes but, perhaps, at a<br />

lower rate than males. Final content for FY96 will be 13.3% of the enlisted endstrength.<br />

Recruiting females is actually more difficult than recruiting males. There are over 240,000<br />

available MOS slots for females in which to enlist. However, females have a lower<br />

propensity to enlist (11% versus 17% for males, YATS89) and gravitate to the more<br />

attractive MOS such as medical (91), administrative (71), supply (76), and communication<br />

(31)-usually the top four MOS for females annually, In FY89 63% of the females were in<br />

AFQT CAT I-IIIA, all but a hatidful were high school graduates, 69% took the four-year<br />

terms, 54% were white, 40.5% were black. The female accession floor of 15,500 was<br />

exceeded by 4%.<br />

Establishing a gender neutral accession mission in the Army will likely lower, not raise,<br />

the number of females who enlist. Money for college and shorter terms of service are the<br />

two most important considerations for females and without exceptional resources and<br />

attention to these attractions, and without proper “gosling” of the recruiter, female<br />

accessions would probably be significantly lower, perhaps by as much as 60%. Contract<br />

goals for females in FY88 was 13.4% with 13.7% achieved; FY89 had 16.3% as a goal and<br />

18.1% of total contracts achieved.<br />

Prior Service(PS). With the changes in Army structure there may be more need for PS<br />

to fill resulting holes in the structure, more as a result of unexpected losses of personnel<br />

than from structured reductions in MOS. However, the present PS requirement of 3,000<br />

in FY91, and 2,000 for FY92 appears intact. USAREC and the CMF 18 initiative to<br />

identify specific requirements for special forces NCOs to reenlist is an example of what<br />

should be developed for MOS fill. Essentially, almost half (42% in FY 87 and 49% in<br />

FY88) required retraining. All PS are AFQT CAT I-IIIA and the majority (90%) take the<br />

four year term, 10.5% were females for FY89, 71.6% were white, and 23.9% were black for<br />

FY89.<br />

Ouality. Relevant, empirical research has clearly shown the need for quality (high<br />

school graduates scoring in the top fiftieth percentile on AFQT) in the Army. Quality is<br />

a valid predictor of persistence or likelihood to finish one’s term of service and of ability<br />

to train in the first term. As the body of research on the performance of high quality<br />

soldiers (Army Soldier Performance Research Project (SPRP) etc.) is disseminated, there<br />

will be greater understanding of the value of quality soldiers and the interest in job<br />

performance and aptitude and ability testing is likely to increase; second term and later<br />

performance has yet to be rigorously analyzed, Nevertheless, the amount of quality<br />

required is and will continue to be questioned by Army leadership, OSD, and Congress.<br />

Another issue is the logical leap required to go from individual soldier quality to unit<br />

performance and readiness, although the connections were articulated in 7th annual report<br />

to Congress on linking enlistment standards to job performance, the argument will be<br />

viewed skeptically for some time. The accession quality improvements to date must not be<br />

10


lost; but asking for more quality, i.e, 67% I-IIIA, 95% HSDG and 4% or less CAT IV,<br />

is stretching the previous well-founded research arguments to the breaking point. The<br />

marginal performance benefit is difficult to quantify, and the cost effectiveness of the<br />

increased quality may not stand up to any close scrutiny.<br />

The Future<br />

Since the endstrength reductions mandated by Congress for the 1991-95 downsizing<br />

precede USAREC structure cuts, USAREC is now recruiting fewer individuals with the<br />

same number of recruiters that had been dictated by higher annual qualitative and<br />

quantitative objectives. The decrease in recruiting difficulty is therefore only temporary;<br />

the, need for models to predict accession requirements based on endstrength goals is critical.<br />

Several accession and force structure models are in various stages of completion.<br />

ALENO (Alternate Enlistment Options) which is being developed by the Concepts Analysis<br />

Agency has the potential to provide future skill level one and two structure requirements<br />

to the MOS level of detail with the input of such variables as term of service, quality and<br />

accessions. ALENO also will translate endstrength requirements into accession inputs by<br />

quality and term of service. In addition, SRA Corporation has developed a prototype of<br />

its Army Force Structure Planning Model (AFSPM) which is aimed at determining<br />

accession requirements in the future considering quality and term of service inputs as well<br />

as retention/attrition rates. The ALENO and AFSPM models should be available within<br />

the next six months. Other models, perhaps less sophisticated, are being developed to<br />

answer the key concerns about accession missions for the future.<br />

In a rather straightforward manner the steady state accession mission can be determined<br />

by past accession ratios. Considering the variables of term of service mix, gender, quality<br />

and its collateral attrition/retention rates and accepting the fact that the new endstrengths<br />

place manpower management in completely foreign territory, the past ratio of enlisted<br />

accessions to endstrength has remained relatively static for many years. Applying ratios of<br />

mission to endstrength for the past six years to the 488,969 enlisted endstrength results in<br />

high and low estimates for the end state mission of 103,200 and 85,600 respectively. A<br />

reasonable estimate for the accession floor to support a 580,000 end state is therefore<br />

85,000. However, with higher quality projected in the outyears (less first term attrition)<br />

and lower average TOS mix, from the FY89 high of 3.88 years to the present average TOS<br />

for FY90 of 3.7 years (and dropping, mostly as the result of offering shorter terms to<br />

attract higher quality), the accession floor is expected to be closer to 90,000 by FY96. The<br />

relationships and effects of these variables to one another and to the accession mission is<br />

considerable. Establishing a Term of Service average objective for USAREC in the annual<br />

rnission letters could be used to better align the force for the future downsizing. Although<br />

the recruiting market dictates what can be sold in contracts, an overall TOS mix average<br />

set in aggregate from the MOS requirements/TOS mix would foster more control over the<br />

longevity (and experience) of the force.<br />

Overall, the systems for establishing and monitoring accessions is in place and has been<br />

effective. The accession objectives, although highly dynamic, can be achieved. The<br />

downsizing, however, leads the Army into completely unmapped terrain which will greatly<br />

test the systems, the personnel managers, and, most notably, the soldiers presently in the<br />

Army.<br />

11<br />

,


Reviewed by:<br />

D.S. Crooks<br />

Lieutenant-Commander<br />

Research Coordinator<br />

ETHNIC PARTICIPATION<br />

IN THE CANADIAN FORCES: DEMOGRAPHIC TRENDS<br />

Lieutenant (Naval) D.T. Reeves<br />

1<br />

Paper presented at the 32nd Annual Conference of the<br />

<strong>Military</strong> <strong>Testing</strong> <strong>Association</strong>, Orange Beach,<br />

Alabama, U.S.A., November 5-9, 1990.<br />

Canadian Forces Personnel Applied Research Unit<br />

Suite 600, 4900 Yonge Street<br />

Willowdale, Ontario<br />

M2N 6B7<br />

Approved by:<br />

F.P. Wilson<br />

Commander<br />

Commanding Officer


Background<br />

ETHNIC PARTKCIPATION<br />

IN THE THE CANADIAN FORCES: DEMOGRAPHIC TRENDS<br />

Lieutenant (Naval) D.T. Reeves<br />

Canadian Forces Personnel Applied Research Unit<br />

Willowdale, Ontario, Canada<br />

INTRODUCTION<br />

Given the current Human Rights climate and the dwindling lal-mlr<br />

force, the perceived lack of ethnic minority representation in the<br />

Canadian Forces (CF) is of concern. Socio-demographic trends portend a<br />

Canadian population marked by cultural diversity ancl an aging, dwindling<br />

Iabaur force. In keeping with the multicultural policy OF the Canadian<br />

government, and in response to the proposed expansion of the Primary<br />

Reserves, the CF is reviewing its representation of ethnic minorities.<br />

Currently there is a dearth of most ethnic minorities in the CF<br />

compared to their representation in the general population (Febhraro G<br />

Reeves, 1990). This under-representation is of concern to National<br />

Defence in its efforts to ensure that the cultural diversity of the<br />

Canadian population is reflected in the composition of the CF.<br />

Purpose<br />

The purpose of this paper is to review the present ethnic<br />

composition of both the Canadian population and the CF, and provide a<br />

preliminary examination of immigrant and visible minority recruitment in<br />

one l.arge urban area.<br />

Definitions<br />

CANADIAN ETHNIC DIVERSITY<br />

In order to ensure concept clarity, the definitions of ethnic,<br />

immigrant and visible minority are AS follows (Multiculturalism and<br />

Citizenship Canada, 1990):<br />

R . Ethnic. The culture or country of origin of an individllal nr<br />

one's ancestors.<br />

b. Immigrant. Anyone who is not a Canadian citizen by birth.<br />

P-. Visible Minorities. Genera3ly, persons other than Aboriginal<br />

peoples, who are non-Caucasian in race or non-white in colour.<br />

There are in excess oE 100 ethnic groups in Canada and, exclud:ng<br />

persons of English and French origins, this represents 25% of the Canadian<br />

population (75% of the Canadian population are of English, French or<br />

multiple English and French origins). Ethnic group concentratinns ‘JR i--g<br />

from p-rovince to province and from city to city and many ethnic groups<br />

13


s record populations fewer than 10,000 members, with a considerable number<br />

Of grcJUps registering 3 , 0 0 0 Or less. In terms of large groups, there are<br />

only 11 which register more than 250,000 members excluding persons of<br />

British or French origins (Multiculturalism and Citizenship Canada, 1990).<br />

Canada's ethnic composition has changed substantially since the end<br />

of World War II. Duri.ng the earliest period of Canadian immigration<br />

history, immigrants arrived largely from Britain and France. After 1945,<br />

they came increasingly from other countries in Western and Eastern Europe<br />

and from the United States. More recently, immigrants to Canada have come<br />

primarily from Asia, Africa, the Caribbean, and Central and South America<br />

- although between 1973 and 1980, Europe was still the single largest<br />

source of immigrants. Immigration levels have fluctuated between 30,000<br />

in 1945 to a peak of 282,000 in 1957. Current immigration projections for<br />

the 1990s are between 150,000 and 175,000 per year.<br />

The composition of ethnic populations varies from province to<br />

province. While people with British origins make up the largest<br />

proportion of the population in all provinces except Quebec, the size of<br />

this proportion varies from 90% in Newfoundland to 30% in Manitoba and<br />

Saskatchewan. Persons horn outside of Canada currently comprise a larger<br />

part of the Canadian population than at any other time. This foreign born<br />

m-w, the majority (80%) of whom are Canadian citizens, now represents<br />

approximately 15% of the Canadian population. Most immigrants (53%) live<br />

in three cities: Toronto (32%), Montreal (12%) and Vancouver (lo%),<br />

although the specific ethnic mixes for each of these cities is different<br />

(Multiculturalism and Citizenship Canada, 1990).<br />

The main source of information about ethnic populations has heen<br />

the Canadian Census. In the past, census data have used such narrow<br />

indicators of ethnic origin as language spoken at home, mother tongue,<br />

Paternal ancestry, and country of origin. The 1986 Statistics Canada<br />

definition of ethnicity is based upon ethnic origin as it refers to one's<br />

cultural ancestral roots, and may therefore reflect ancestry, nationality,<br />

race, language or religion, hut should not he confused with citizenship or<br />

nationality in the strictest sense (Statistics Canada, 1988). In 1986,<br />

for the first time, the census recorded both single and multiple ethnic<br />

origins in order to establish a more accurate picture of the ethnic<br />

make-up of Canada's population. As a result, a substantial proportion<br />

(282) of Canadians jndicated multiple origins in their ancestry. Given<br />

the multiplicity and changing nature of the definition of ethnicity and<br />

the increasing numbers of multiple origin members, caution should be<br />

exercised not to use Canadian ethnic origins in any ahsolute way-<br />

Canada's largest ethnic origin groups are; 8.4 million British<br />

only, 6.1 million exclusively French origins, and 1.2 million both British<br />

and French. Almost 9.4 million Canadians indicated at least one ethnic<br />

origin other than British or French, and more than 6 million Canadians<br />

reported having non-British and non-French ethnic roots (Multiculturalism<br />

and Citizenship Canada, 1990). Table 1 is based on 1986 census data and<br />

shows the ten largest ethnic groups in Canada. The CF is dominated by<br />

members of British and French origin. The under-representation of most<br />

other ethnic groups is apparent by examining Table 2 census data<br />

14


Tahte 1<br />

The Ten Largest Ethnic Groups in Canada (1986 Census)<br />

1.<br />

2.<br />

3.<br />

4.<br />

5.<br />

6.<br />

2<br />

9.<br />

IO.<br />

Single Multiple<br />

Origins Origins Total %" %'l<br />

English 4,742,040 4,562,910 9,303,950<br />

French 6,087,310 2,027,945 8,115,255<br />

Scottish 865,450 3,052,605 3,918,055<br />

Irish 699,685 2,922,605 3,622,290<br />

German 896,715 1,570,340 2,467,055<br />

Italian 709,590 297,325 1,006,915<br />

Ukrainian .420,210 541,100 961,310<br />

Dutch (Netherlands) 351,760 530,170 881,930<br />

Polish 222,260 389,845 611,745<br />

North American<br />

Indian 286,230 262,730 548,960<br />

36.8 la.7<br />

32.1 24.1<br />

15.5 3.4<br />

14.3 2.8<br />

9.7 3.5<br />

3.9 2.8<br />

3.8 1.7<br />

3.5 1.4<br />

2 .4 0.9<br />

2.2 1. . 1<br />

Note. In all calculations, the figure used to represent the total Canadian<br />

population is 25,309,331.<br />

aInrlicating single and multiple origins (based upon total response data).<br />

'Percent of Canadian populatinn indicating a single origin.<br />

Tahle 2<br />

Representation of Selected Ethnic Groups in the CF (1986 Census)<br />

Ethnic Origin<br />

% of Canadian % of CF Officer % of CF Non-Officer<br />

Population Population Population<br />

British 25.0 29.0 29.3<br />

French 24.1 19.2 26.8<br />

German 3.5 2.9 2.3<br />

Italian 2.8 .4<br />

Ukrainian 1 . 7 1.3 :iZ<br />

Dutch 1.4 1.6 1.0<br />

Chinese 1.4 .3 .I<br />

South Asian .9 .2 .I<br />

Rlack" 1.0 .- 7 .2<br />

Ahoriginalsb 2.8 1.4 2.6<br />

Visible Minoritiesb 6.4 1.8 1.8<br />

aRlack represention estimates based upon personal communication with L.'iz!<br />

Director of Personnel Informations Systems, 1990. bBased upon Empl~~vm~nt<br />

and Immigration statistics (1989).<br />

15


(Statistics Canada, 1988). Almost all of these figures are helow their<br />

corresponding statistics in the general Canadian population with the<br />

Italian, Chinese, South Asian and Black groups being the most underrepresented.<br />

IMMIGRANT ANT) VISIDLE MINORITY RECRUITMENT<br />

Non-Commissioned Members - Regular Force<br />

A recent review of regular force non-commissioned (NCM) recruit<br />

applications (conducted at Canadian Forces Recruiting Centre (cFRC)<br />

Toronto in August 1990) indicated that 91.4% of applicants were Canadian<br />

born. Foreign born (immigrant) applications were only 8.6% of the total<br />

of those applying, and well below their Census Metropolitan Area Toronto<br />

representation level of 36% of the population. A further breakdown of<br />

foreign horns revealed thdt 55% of this applicant group (or 4.7% of the<br />

total applicant population) were mcmhers of a visible minority. This<br />

compares with a national visible minority representation of approximately<br />

6% and a Census Metropolitan Area Toronto representation of 13% (visible<br />

minority status of the Canadian born group could not he 'established using<br />

file information). Typically, the average number of regular force NCM<br />

applicants who go on to become enrolled is approximately 30%. In this<br />

most recent revi.ew, however, only 4% of visible minority applicants and 8%<br />

of foreign horn applicants were enrolled. Although these figures are<br />

hased on active files, and therefore more will prohably enrot before the<br />

end of 1.990, they will. still. remain below the foreign horn and visible<br />

minority representation levels for Census Metropolitan Area Toronto and<br />

the nation as a whole. Out of a total of 51 foreign horn regular force<br />

NCM applicants, there were four enrolments; and while 28 of these foreign<br />

horn applicants were visihle minorities, only one was enrolled.<br />

Officers - Regular Porte<br />

Regular force officer applicants were 80.1% Canadian horn and 19.9%<br />

foreign horn with 34.3% (or 6.8% of the total population) of the foreign<br />

horn group being visible minorities. As was noted with the NCM regular<br />

force applicants, both the foreign born and visihle minoritry groups were<br />

well below their respective representative figures for Census Metropolitan<br />

Area Toronto. The pattern of low enrolment seen with regular force NCM<br />

candidates, however, was ameliorated for regular force officer applicants,<br />

of which 17.1% foreign borna, 20.8% visible minorities, and 16.7% Canadian<br />

horns, enrolled. Out of 70 foreign horn regular force officer applicants,<br />

there were 12 enrolments; and while 24 of these Eoreign horn applicants<br />

were visihle minorities, five were enrolled.<br />

Non-Commissioned Memhers - Reserve Force<br />

In contrast to the regular force NCM recruiting, a review of NCM<br />

reserve force files indicated a much more positive picture, with Canadian<br />

horn applicants representing 56X, foreign horns 44% (8% ahove Census<br />

Metropolitan Area statistics), and visible minorites 68.8% (or 30.1% of<br />

the total applicant population). NCM enrolment percentages for Canadian<br />

horns (45.9%) were highest, with enrolments of 33.8% and 32.1% for foreign<br />

horns and the visible minorities, respectively. It was also noteworthy<br />

that 12.6% of reserve force NCM applicants (22.7% of whom were enrollerl)<br />

16


or 25.6% of foreign horns, :Jero actually non-Canadinns, i.r?., new arrivals<br />

til C.anada . Cut nf 154 foreign horn reserve Eorce NCM applicants, there<br />

were 52 enrolmcnts; and while 196 of these foreign horn i1ppJicant.s were<br />

vi.sihle minorities, 34 were enrnlle%l.<br />

Officers - Reserve Force<br />

Reserve force officer applications roughly par.aJlcl that nf reo11?;1r<br />

0<br />

force officer applications wi.th 76.8% heing Canadian horn, 23.2% Fi)l-;~i~ll<br />

horn and 38.5% members of a visihle minority (or 3.9% of the tot:+J<br />

applicant population). Enrolments for the reserve force officers wer('<br />

similar for both Canadian horns (37.2%) and foreign horns (38.5%), and<br />

much lnigher for visible minority members (60%). Out of 13 foreign horn<br />

reserve force officer applicants, five were enrolled; and while five of<br />

these foreign horn applicants were visi:>le minorities, three were enrnJ.lerl.<br />

nISCUSSION<br />

T!w present review indicates that the CF regular Eorce does not<br />

reflect the Canadian cultllral mosaic. Amongst the under-represented<br />

groups, census figures indicate that Italian, Chinese, Rlack and Sollth<br />

Asian origins tend to he the lowest. This under-representation, combined<br />

with substantial Canadian populations, make these groups of ~peel~1'1<br />

interest for more focussed research. In terms of recruiting initiatives,<br />

all four groups represent a nota3lo, and as yet untapped, snurce 0 f<br />

personnel (Chinese, Black an!1 South Asians make up the largest and f.astest<br />

growing visible minority groups in Cana:i.a).<br />

Although immigrant recruitment at CFRC Toronto does not necessarily<br />

reflect national recruiting norms, tliis preliminary review suggests th.s!:<br />

the NCM reguJar force does not attract immigrants and visi5lc minority<br />

members at a representative rate. mis situation is somewhat improved for<br />

regular force and reserve force officers, and although they remain significantly<br />

below those levels required for representativeness in Censlls<br />

Metropolitan Area Toronto, they are ahove both the 3.986 census and nation;>!<br />

representation levels for CF visible rn1norit.v representation. In contrast,<br />

findings for the reserve force NCM applicant group in Toronto suggest that<br />

this group is actually ever-represented hy both immigrants and v-lsiblll<br />

minority members.<br />

The will.ingneas of individuals from immigrant groups tn apply for<br />

NCSI reserve force service in relatively large numhers, wllile at the ww<br />

time, avoiding regular force application, suggests that the attitrldes held<br />

by these groups regarding regular force employment may he suhstantinlly<br />

different. Since immigrant and viaiSle minority members have nnt shown an<br />

antipathy to apply for military duty per se, it is important to determine<br />

the specific attitudes which are held by these groups which may be acting<br />

as barriers to regular force enrolment. Knowledge gained ahout t':.lcsk><br />

groups in terms of distinct ethnic attitudes toward the CF may he used t:><br />

mndify the recrlliting approach to other under-represented groups for w'-~i(*l?<br />

study may he problematic (smaller numhers and wider geographic<br />

dispersion). These findings will have important cnnsequences for futilr~~~<br />

effective ethnic recruiting initiatives.<br />

17


. REFERENCES<br />

Employment and Immigration Canada. (1.989). Employment equity availahility<br />

data report on designated groups. Technical Services Employment<br />

Equity Branch. Ottawa: Minister of Supply and Services.<br />

Febbraro, A., & Reeves D.T. (1990). A literature review OF ethnic attftude<br />

formation: Implications for Canadian Forcqs recruitment (Working<br />

Paper 90-2). Willowdale, Ontario: Canadian Forces Personnel<br />

Applied Research Unit.<br />

Multiculturalism and Citizenship Canada. (1990). Multicultural Canada; 'A<br />

Graphic Overview. PO1 icy and Research Multiculturalism Sector<br />

Multiculturalism and Citizenship Canada. Ottawa: Minister of<br />

Supply and Services,<br />

Statistics Canada. (1988). Census handhook. Ottawa: Minister of Supply<br />

and Services.<br />

18


1990 ARMY CAREER SATISFACTION SURVEY<br />

Timothy W. Elig’<br />

U.S. Army Research Institute<br />

To help personnel officials prepare for the eventual downsizing of the Army, the Chief of Staff,<br />

Army (CSA) directed that a survey of soldiers be conducted rapidly. “The downsizing of the U.S. Army is<br />

inevitable,” BG Stroup wrote in a memorandum requesting the Army Research Institute (ARI) to conduct a<br />

survey “. . . to determine the attitudes and concerns of our soldiers about the changes that will take place.”<br />

Even as events in Southwest Asia and Operation Desert Shield have dominated the news headlines,<br />

other important events have continued. Discussions about federal budget deficits, the end of the Cold War<br />

era, increased cooperation between the U.S. and the U.S.S.R, and German reunification are also front page<br />

news that lead to speculation about a reduction in the size of U.S. military forces.<br />

Soldiers may feel that their careers are being victimized by their contributions to the successful<br />

conclusion of the Cold War even as they are asked to risk their lives for their country: Many of the soldiers’<br />

concerns about their career and prospects for downsizing may in fact be made worse by recent events that<br />

have fostered even more uncertainty and curtailed the flow of information on the future make-up of the<br />

Army. Thus it is important to understand the morale of the force as it was just prior to Operation Desert<br />

Shield, in order to understand how soldiers are likely to.respond to continuing career uncertainties.<br />

About this Survey<br />

The 1990 Army Career Satisfaction Survey (ACSS) was designed by AR1 to answer several questions<br />

raised by the CSA and by DA personnel policy makers and analysts. Administration costs were paid by<br />

HQDA though the Army Research Office’s Scientific Services Program.<br />

This survey was designed to provide an overview of soldiers’ attitudes, perceptions, and intentions<br />

concerning Army downsizing. While not all of these topics are discussed here, the survey included items on:<br />

career plans and intentions; advice to others on joining the Army; the Army experience as preparation for<br />

civilian jobs; organizational commitment and trust; reactions to European thaw in cold war and to<br />

downsizing; expectations about what a smaller Army would mean and what the Army would be like over the<br />

next live years; soldiers’ sources of information on downsizing and their trust in the sources; specific personal<br />

and family concerns about involuntary separation and resources needed to cope with unexpected separation;<br />

financial and emotional resources for separation; reactions to specific personnel management policies that<br />

could be implemented for downsizing; and propensity to accept “early-outs.”<br />

Thirty thousand soldiers (15,000 in enlisted, 10,000 in commissioned, and 5,000 in warrant ranks)<br />

were surveyed in June and July 1990. The main sample of 28,071 represents soldiers at all ranks countable<br />

toward the active strength of the Army on 31 March 1990, with the following exclusions: a) general officers,<br />

b) soldiers with less than 12 months of service, and c) soldiers in the process of separation or retirement.<br />

Another 1,929 soldiers who had been surveyed in previous efforts were also sent this survey in order to<br />

measure attitude changes over the last four years.<br />

Preliminary results from partial returns were provided to HQ, Department of the Army, in late July<br />

and early August. The final results presented here are based on 17,326 returned surveys from 6,997<br />

‘The findings in this report are not to be construed as an official Department of the Army position, unless SO designated<br />

by other authorized documents.<br />

19


- commissioned officers, 3,596 warrant officers, and 6,733 enlisted soldiers in the main sample. These data<br />

have been weighted to be representative of the Army.<br />

On the basis of both response rates and margins oi error, this survey provides accurate attitude<br />

estimates for the entire Army and for relatively small subgroups. Response rates for the survey were<br />

extremely good. Completed surveys were returned by 58% of the main sample. When adjusted for postal<br />

non-delivery and late returns of completed surveys the overall response rate is 65% (80% of warrant officers,<br />

76% of commissioned officers, and 51% of enlisted).<br />

The overall margin of error is less than 1.3% indicating that 95% of the time a sample estimate of<br />

50% is within 1.3% of how the entire population would respond if surveyed. Margins of error are also quite<br />

small for each of the three main groups (1.3 for commissioned officers, 1.7 for warrant officers, and 1.6 for<br />

enlisted) and for subgroups of soldiers defined by categories such as gender or rank.<br />

Soldiers Are Positive About Themselves, <strong>Military</strong> Service, and Their Skills<br />

Soldiers are positive about military service for themselves, There is a strong core of committed<br />

soldiers (57% of commissioned officers, 63% of warrant officers, 45% of enlisted) who want to serve for 20<br />

or more years even if they could retire earlier. For many of these soldiers, the kind of work they most enjoy<br />

is available only or primarily in the military. This is most strongly characteristic of commissioned officers.<br />

Soldiers are confident of their own job performance; over three quarters of them said they were well<br />

prepared or very well prepared to perform the tasks in their wartime jobs. Three-quarters of soldiers also<br />

rated their units as combat ready. Soldiers’ confidence in their job performance and military skills is also<br />

reflected in their evaluation of civilian-relevant skills. When asked if they agreed or disagreed with the<br />

statement “I have been taught valuable skills in the Army that I can use later in civilian jobs” 70% expressed<br />

agreement. Soldiers were even more positive about the effects of their Army experiences on skills and<br />

characteristics that would help them obfairz civilian jobs; 80% felt that the Army had a positive effect on<br />

specific job knowledge, skills, and abilities, while 86% felt that the Army had a positive effect on personal<br />

characteristics and attitudes. In another recent AR1 survey, Benedict (1990) found that even first-term<br />

soldiers recognized the value of their Army experience with 64% to 77% rating the Army as having a positive<br />

effect.<br />

Despite the unsettling times of the first half of 1990, soldiers were positive about recommending<br />

military service to others. When asked what they would tell a good friend who asked for advice on seeing a<br />

military recruiter, soldiers were ttearly eight tirves as likely to tell them that it was a good idea (46%) as to tell<br />

Iltetn that it was a waste of time (6%). The rest (47%) would tell their friend that it was up to him or her,<br />

apparently recognizing that military service is not for everyone. When asked specifically about enlistment in<br />

the Army, soldiers were twice as likely to recommend Army enlistment (60%) as enlistment in another<br />

service (27%). Only 13% would recommend not enlisting in any military service. Even on the very personal<br />

issue of their own children joining the military, soldiers were also fairly positive. Although less than onethird<br />

would like to see their daughter join the military, over two-thirds would like to see their son join the<br />

military at some point.<br />

Army Downsizing and Career Opportunities<br />

As we would expect, most ofticers (61% of commissioned and 55% of warrant) and many enlisted<br />

(40%) said that the chances of war with the Soviet Union were reduced by recent changes in East Germany,<br />

Hungary, Poland, and Czechoslovakia. However, there is still a perceived threat of war because of internal<br />

problems in the Soviet Union (economic problems, Lithuanian independence movement, ethnic unrest and<br />

20


clashes, etc.). It may be that the 61% of officers and 49% of enlisted who said yes to an increased chance of<br />

war with the Soviet Union were in fact just responding to this item as an increased chance of war, perhaps<br />

civil and not with the U.S.<br />

Although some soldiers said recent world events would probably affect what they do in the Army,<br />

the most likely impacts were seen in force size and promotion potential. As a result of recent world events<br />

48% said it was likely that demands on their time would increase. These soldiers could worry that mission<br />

statements will not be scaled back as resources and structure are cut, or that work details may replace<br />

training time for troops as many experienced during the draw-down of the early 1970’s. Further, two-thirds of<br />

soldiers (79% of commissioned officers, 69% of warrant officers, and 65% of enlisted) said it is likely that<br />

promotion oppomolities will decrease as a result of receilt world events. AS one officer commented, “How<br />

ironic that the very soldiers who brought about this peace dividend are the ones who have to suffer.”<br />

Reductions in force requirements have decreased soldiers’ confidence in their ability to be promoted<br />

and to have the opportunity to complete at least 20 years of Army service. Only 46% of commissioned<br />

officers, 59% of warrant officers, and 52% of enlisted were confident that as the Army becomes smaller they<br />

would be able to stay in the Army and be promoted on or ahead of schedule.<br />

Concerning the size of the future Army, soldiers were asked to predict the likelihood of several<br />

percentage reductions. Voting for as many size-cuts as they felt were likely. Three-fourths of soldiers<br />

believe today’s Army will be cut by up to 10%; over one-half believe the cuts will be about 20%; and nearly<br />

one-third believe the cuts will be at least 30%. Commissioned officers voted the reduction as likely to bc<br />

considerably larger than did the enIisted and the warrant officers.<br />

For the majority of soldiers, interest in<br />

serving in the Army is not strongly influenced by<br />

the size of force reductions that may be imposed.<br />

However, of the 40% of officers whose interest in<br />

serving is influenced by the size of the force, threefourths<br />

are less interested in serving in a pareddown<br />

Army and only one-fourth are more<br />

interested (Figure 1). This may well be related to<br />

fears about quality and career opportunities. Less<br />

than half of officer, warant and enlisted soldiers are<br />

conjidettt that the best officers, NCOs, and junior<br />

(skill level 1) enlisted will stay as the Army becomes<br />

smaller.<br />

While the question was not asked directly,<br />

it is possible that opportunities to exercise<br />

leadership may also be seen as decreasing in a<br />

smaller Army, especially by those less interested in<br />

serving in a smaller Army. It may be important to<br />

point out to those interested in developing their<br />

leadership skills that requirements for creative,<br />

effective leadership are likely to increase during a<br />

transition; and that opportunities to learn these<br />

difficult leadership skills will remain high even in a<br />

smaller Army. It is also likely that these skills will<br />

be in much greater demand in a civilian sector<br />

facing its own pressures for streamlining and<br />

efficiency.<br />

21


.<br />

Although less than 10% say they are leaving due to potential changes and cuts, many more arc<br />

concerned, and as many as 20% think it was a mistake to stay beyond their original obligation. Further 3O’;No<br />

say it’would take a lot to keep them beyond their current obligation. While only 12% have applied for a job<br />

in the last year, 41% have sought information about civilian jobs in case they leave the Army.<br />

One-fourth expect to be RIFed. Even<br />

more expect to be offered an early out (34% of<br />

commissioned officers and 44% of enlisted). See<br />

Figure 2. At least one-half were more concerned<br />

than a year ago about their long-term opportunities<br />

in the Army (62%), the kind of work they will go<br />

into when they leave the Army (56%), whether or<br />

not they would be able to quickly get a civilian job<br />

if needed (62%), and financial burden on self and<br />

family should they have to leave the Army<br />

unexpectedly (69%). Debt exceeds available<br />

savings for enlisted and warrant officers. Onefourth<br />

would also lose other family member income<br />

because of relocation if separated unexpectedly.<br />

Over three-fourths reported that it would be<br />

difficult or very difficult financially to be<br />

unemployed for two or three months.<br />

Soldiers are also pessimistic about what the<br />

future holds. Compared to how satislied they said<br />

they are today, fewer soldiers expect to be satisfied<br />

with the Army of 5 years from now in respect to<br />

job security (57% vs 38%), benctits (57% vs 43%),<br />

overall quality of life (49% vs 39%), and<br />

opportunities to do work liked (49% vs 41%).<br />

Officers also expect to be less satisfied with pay<br />

and allowances (55Y0 vs 44% for commissioned<br />

officers and 43% vs 38% for warrant officers). The same percentage (38%) of enlisted are satisfied with pay<br />

and allowances now as expect to be satisfied with pay and allowances five years from now. Beliefs about the<br />

future may determine interest in remaining in the Army. Expected satisfaction with future pay, benefits, job<br />

security, quality of life, and opportunities to do work one likes are each correlated with being more<br />

interested in serving in a smaller Army.<br />

Further, soldiers are even more likely to see the Army as suffering from a rapid draw-down than to<br />

see themselves as suffering. More soldiers agreed that the Army will cut strength so quickly that readiness<br />

(62%) and morale (68%) will suffer than agreed that they (36%) or their family (41%) will suffer.<br />

Information Flow<br />

Three-fourths of soldiers said they are not getting the right amount of information on future<br />

personnel reductions in the Army; in fact 15% of soldiers said they are getting no information. They tend 10<br />

credit the A~~J.J Times or other media with providing what information they do obtain. One soldier<br />

commented that “Our main source of information for issues on RIF, closure of bases, etc. is the mass<br />

media.” Only about one-half of soldiers think information on cuts in Army strength is reliable when obtained<br />

from the chain of command; one-third said they did not get information on cuts from the chain of command.<br />

Overall, 57% think information on the future of the Army that they receive from the Army itself (chain of<br />

command, post newspapers, etc) is accurate while 40% think it is timely. Roughly five percent of the<br />

22<br />

I


espondents were so concerned about this lack of information that they wrote comments about it on the<br />

questionnaire.<br />

Although 3% felt they were getting too much information on future personnel reductions, the<br />

overwhelming majority of soldiers want more information from the chain of command and asked for it in<br />

their comments: “Please keep us informed! Do not keep us in suspense ’ and “I think families of soldiers<br />

should have more information about their spouses’ careers and pay raises, early outs, pay cuts etc.”<br />

Of course, what they really want is for the dust to settle and for all the decisions to have been made.<br />

As one soldier put it: “I feel that the Army has hurt morale by coming out and saying the Army must<br />

decrease, way before it is time.” Another soldier expressed it in this way: “I feel that the military is moving<br />

kind of fast and who knows what the future holds.” Other comments reflect a perception by some that the<br />

cuts are being made already: “Forget the go slow method . . . Make the cuts/RIFs in one year and get’it ovci<br />

with . . . using promotion boards in lieu of RIFs is having a terrible effect on morale.”<br />

Concerns and Needs if<br />

Involuntarily Separated<br />

While most of the questionnaire<br />

dealt with the current attitudes of Army<br />

soldiers, it also contained questions on<br />

what soldiers’ concerns would be if<br />

involuntarily separated as well what help<br />

they would need in transitioning to a new<br />

career. Overall, if involuntarily separated,<br />

more than one-half of pcrsonncl would be<br />

very concerned or extremely concerned<br />

about separation pay (70%), health and<br />

dental care (63%), securing a job (61%),<br />

unemployment compensation (GO%), and<br />

health insurance (58%). Further, more<br />

than one-third were very or extremely<br />

concerned about advancing their education<br />

(48%), finding a place to live (46%), child<br />

care and schools (37%), and spouse<br />

employment (36%). Because some<br />

concerns may not be widespread, but may<br />

be vitally important to those who do have<br />

the concern, soldiers were also asked what<br />

were their three most important concerns.<br />

These most important concerns (adding<br />

together the three selections) are securing<br />

a job (over 80%), finding a place to live<br />

(450/o), separation pay (over 30%), and<br />

health and dental care (about 30%). ASO<br />

while only 37% and 36% of soldiers<br />

overall are very or extremely concerned<br />

about child care/schools and spouse employment, the percentages . jump . _-_.a to 57% r and ,. 50% I. respectively if we<br />

consider only those to whom these questions apply. And while only 28% or au ennsted are very or extremely<br />

concerned about enrollment in GI Bill by paying $1200, 46% of those who are not already eligible are very<br />

or extremely concerned about this. (Note that officers were not asked about Montgomery GI Bill benefits.)<br />

If they were to be involuntarily separated, soldiers saw a variety of job search tools as important<br />

23


including: labor market information and job banks, time-off (not charged to leave) for interviews and<br />

relocation planning, training and counseling. Specific needs as well as preferences for where services are<br />

provided will become part of the information base used in planning transition services.<br />

Personnel Policies and Other Issues<br />

A major section of each form of the questionnaire (enlisted, commissioned officer, and warrant<br />

officer) dealt with specific personnel policies and concerns. These issues are being examined by the<br />

appropriate divisions of the Office of the Deputy Chief of Staff for Personnel, with continuing support from<br />

ARI.<br />

Work is continuing on demographic differences and analyses of such issues soldiers’ career intentions<br />

and perceived vulnerability to involuntary separation. We are also examining the issue of where soldiers<br />

would move to if invoIuntarily separated. This affects how much the Army would have to pay for<br />

unemployment compensation and could affect recruiting markets as well. The data are also being made<br />

available to the Army War College and to Army oflicers at the Naval Postgraduate School for student<br />

research.<br />

Several of the survey questions were previously used in AR1 research efforts. Most importantly,<br />

many of the career intention and commitment items were contributed by ARl’s Longitudinal Research on<br />

Officer Career (LROC) project (Carney, In Preparation). AR13 Army Family Research Program (AFRP)<br />

contributed items on readiness, morale, and family situations (Bell, In Preparation). These research groups<br />

at AR1 are currently including these items in their analyses.<br />

BIBLIOGRAPHY<br />

Baker, T. (In Preparation). Potential GeodemokTaphic Effects of Army Force Reduction: mere Soldiers Pfan<br />

to Move ifseparated. Alexandria, VA: U.S. Army Research Institute.<br />

Bell, B. (In Preparation). The Amy Family Research Program (AFRP) Slmrvey, Alexandria, VA: U.S. Army<br />

Research Institute.<br />

Benedict, M. E. (1990). llle 1989 ARI Recruit Experience Trackirtg Survey: Descriptive Statistics of NPS<br />

(Active) Army Soldiers (AR1 Research Product 90-16). Alexandria, VA: U.S. Army Research Institute.<br />

Carney, C. (In Preparation). Longitudinal Research 011 Ofjicer Careers. Alexandria, VA: U.S. Army Research<br />

Institute.<br />

Elig, T. W., & Martell, K. A. (1990, October). 77le 199OAmly Career Satisfacfion Slrrvey (AR1 Special<br />

Report).‘Alexandria, VA: U.S. Army Research Institute.<br />

Elig, T. W. (In Preparation). 77le 1990 Army Career Satisfaction Survey: Descriptive Stutistics for<br />

Commissiorled OfJcer, Warrant Officer, and Enlisted Soldiers. Alexandria, VA: U.S. Army Research<br />

Institute.<br />

Elig, T. W. (In Preparation). The 1990 Amly Career Satisfaction Survey Technical Manual. Alexandria, VA:<br />

U.S. Army Research Institute.<br />

Elig, T. W., Benedict, M. E., & Gilroy, C. L. (1990, June). ARl 1990 Employer Survey Summary RepoJl (AR1<br />

Special Report). Alexandria, VA: U.S. Army Research Institute.<br />

Hay, M. S., Sr Middlestead, C. G. (In Preparation). Amtv Force Reductiorts, Soldiers’ Career htentions, anil<br />

Perceptions of Vltlterability. Alexandria, VA: U.S. Aimy Research Institute.<br />

24


The Use of Artificial Neural Networks<br />

in <strong>Military</strong> Manpower Modeling<br />

Jack R. Dempsey, D.A. Harris, and Brian K. Waters<br />

Human Resources Research Organization<br />

A new ides is delicate. It can be killed by a sneer or a yawn: it can be stabbed to death by a<br />

quip, and worried to death by a frown on the right man’s brow.<br />

-Charlie Brower--<br />

The military has been a trailblazer in the realm of manpower modeling and personnel measurement. According<br />

to an old saying, “Necessity is the mother of invention.” Well, due to the formidable recruiting and selection tasks<br />

facing the Services, pioneering efforts have been made and continue to push the military to or past the state of the<br />

art. There arc again innovative techniques which the Services are (or should be) considering to aid military selection<br />

strategies.<br />

<strong>Military</strong> selection policies are a topic of high level interest and scrutiny. Each of the Services sets standards<br />

for selection on the basis of citizenship, age, moral character, physical fitness. aptitude, and education credential.<br />

The latter two entry criteria are the ‘most visible screening mechanisms and the ones which the Department of<br />

Defense (DOD) uses to define and report recruit quality levels to Congress and other interested parties. Aptitude, as<br />

measured by composite scores from the Armed Services Vocational Aptitude Battery (ASVAB), is used to predict<br />

military technical school performance. Education credentials are used for adaptability screening. That is, they assess<br />

the likelihood of attrition, or positively, that a recruit will complete an obligated term of service. Both aptitude and<br />

education crcdcntial standards have been called into question of late by Congressional watchdogs. Actually, the<br />

flurry of interest in aptitude standards dates back to 1980 when Congress learned that between 1976 and 1980 the<br />

ASVAB norms were incorrect. This resulted in accepting hundreds of thousands of recruits who did not meet the<br />

intended minimum aptitude standards. Furthermore, Congress learned, much to its dismay, that enlistment standards<br />

were validated against training performance not actual job performance. Congress continues to inquire: What is the<br />

relationship between aptitude and job performance? And, how much quality is needed to ensure adequate job<br />

performance? A Herculean, on-going, multi-year job performance measurement (JPM) project has provided answers<br />

to the fist question while the answer to the second is in progress.<br />

More recently, education standards have come under attack by Congress and educational lobbying groups.<br />

Currently the plethora of credentials are categorized into one of three tiers based upon attrition rates. Each tier has<br />

differential aptitude standards and recruiting preferences. While education credential is the single best predictor of<br />

attrition, objections to this policy revolve around the fact that many individual members of the non-preferred tiers<br />

arc successful in service and are therefore wrongfully denied enlistment on the basis of group membership.<br />

The dual problems of linking quality requirements to job performance and implementing more cquiuble<br />

adaptability screening methods requires innovation. Classical statistical techniques may not provide the answer.<br />

These military selection questions require more sophisticated and less familiar modeling techniques. Just how do<br />

techniques such as neural networks complement the more common modeling procedures? The performance prediction<br />

and attrition screening applications described below provide at least a little food for thought and may suggest that<br />

a more in-depth look is required.<br />

Linking Standards to Job Performance<br />

This project’s purpose is to bring the Joint-Service Job Performance Measurement/Enlistment Standards (JPM)<br />

Project to fruition. This will be accomplished through four lines of endeavor. First, the military’s recruit selection<br />

measures (e.g., ASVAB) must be related to job performance in virtually all occupations. Second, a methodology<br />

must be developed so that empirical data can inform the setting of enlistment standards. That is, me expecled job<br />

performance of recruits, over their first term of enlistment, should match total job performance requirements. Third,<br />

improved. trade-off model(s) must be developed so that force quality requirements--based on empirically grounded<br />

job performance requirements--are considered along with related costs in the determination of enlistment standards.<br />

Finally, the Services’ personnel allocation systems must be made responsive to empirical information about the<br />

pcrformancc requirements of particular jobs.<br />

The data used for the Linkage Project consisted of 8,464 individual scrvicc mcmbcrs in 24 different occupations<br />

who had been administered hands-on performance tests as a part of the IPM Project. Each record contained ASVAB<br />

25<br />

.


subtests scores, time-in-Service, and education credential or diploma status.<br />

Job characteristics were obtained from an existing Department of Labor data base and represent an assortment<br />

of information about civilian jobs. The data base contained ratings on work complexity: training times; worker<br />

aptitude, temperament, and interest requirements; physical demands: and environmental conditions. Over 12,000 jobs<br />

were rated as part of a massive job analysis project culminating in the publication of the Dictionary of Occupational<br />

Titles (DOT) (U.S. Department of Labor, 1977).<br />

Direct ratings of the occupational characteristics of military jobs were not available, however, their ratings<br />

were estimated from matching equivalent civilian jobs. The results of the <strong>Military</strong> Occupational Crosscode Project<br />

(Lancaster, 1984; Wright. 1984) were used to determine military-civilian equivalence. Subsequent to ascribing<br />

civilian job characteristics, to the population of military jobs were factor analyzed. Initially, a five factor orthogonal<br />

rotation was adopted as the most appropriate, interpretable, and parsimonious solution. (These factors are referred<br />

to as PCl-5 in the network that follows).<br />

Regression Anproach<br />

Linking standards to job performance requires a performance prediction model. Using performance SCO~CS for<br />

24 jobs from the JPM project, individual characteristics, and AFQT scores, we examined a model in which each job<br />

was allowed to have its own intercept. This model was the baseline against which other models and techniques wcrc<br />

compared. This model gives the performance Pij of individual i in job j as:<br />

where:<br />

Pij = hands on performance test score,<br />

Tij = ASVAB technical composite score,<br />

Et, = education,<br />

X, = experience.<br />

Pij = aj + fij T, + rj E, + Sj X, + 4,<br />

Note that the subscript j for job on 4, fij, x and 8, implies that there is a different coefficient for each job, which<br />

arc treated for the moment as fixed.<br />

The coefficients in this model were estimated with ordinary least-squares by using a vector of dummy variables<br />

D for jobs and entering D, T, D x T, E, D x E, X, and D x X, The overall R* for this model is R2 = S96 with<br />

93 degrees of freedom for the model and 8370 residual degrees of freedom. Though a substantial amount of<br />

variance was accounted for, this model is not generalizable to jobs outside of the particular ones included in the<br />

model.<br />

To ensure generalizability, we examined a model in which job characteristics were used to predict various jobspecific<br />

effects. This is a fixed effects two-level model which uses job characteristics, expressed as a vector Mj to<br />

predict the job-specific intercepts and coefficients of the individual characteristics. The two-level form of the model<br />

was expressed as:<br />

where<br />

Pij = uj + pj Tij + 1: E, + 8, X, + &ij<br />

where n,, no, or, and n6 are row vectors of regression coefficients. The fixed effects model actually estimalcd<br />

Wit.%<br />

Ej = a + P Tij + y E, + 6 X, + A M, + B Mj Tij + r Mj E,j<br />

+ A M, X, + E, ,<br />

26


where:<br />

A = (A,, . . . . A,) , B = (B,, . . . . B,) , l- = (l-,, . . . . I?,) , and A = (A,, . . . . A)<br />

are vectors of regression coefficients. The RZ for this model is .350. This is considerably smaller than the R* z<br />

.59 achieved when intercepts were completely unconstrained. This suggests that the job characteristic factor scores<br />

explain a portion, but by no means all, of the variability in the job specilic intercepts.<br />

The Neural Network Approach<br />

Having witnessed the rather large degradation in variance explained between Model I and Model II, i.e.,<br />

R2=.595 to R*=.350, a neural network paradigm was investigated. Once the candidate explanatory variables were<br />

determined from the second model, the next step was to construct a neural network capable of analyzing the problem.<br />

Actual construction involved the following five steps which specified:<br />

o network type<br />

o number of nerodes in the output and hidden layers<br />

o training and cross validation gamples<br />

o transfer function at each layer and global error function<br />

o scaling, learning, momentum, epoch size parameters<br />

Network Architecture<br />

Since the problem involved a (hetero-associative) mapping of continuous, dichotomous, and polytomous<br />

explanatory variables to a bounded continuous criterion measure of hand-on-performance, a forward-feed backward<br />

error propogation network was chosen.<br />

Ostensibly, the single output “neurode” was hand-on-performance test score. Because the number of neurodes<br />

in the hidden-layer of the feedforward network determines the complexity of the function the network is capable of<br />

mapping, 26 was determined to yield a sufficiently complex network. Notably, it has been shown that any<br />

continuous function or ”.,.mapping can be approximately realized by Rumelhart-Hinton-Williams’ multilayer neural<br />

network with at least one hidden layer whose output functions are sigmoid functions (Funahashi, 1989; Homik,<br />

1989).”<br />

The data were randomly split 60140 into two sets. The first (N=5,078) was used to train the network and the<br />

second (N=3,386) was used to validate the network. The transfer function for the output neurode was logistic, while<br />

the transfer functions for the hidden neurodes were hyperbolic. Although any error function which is continuously<br />

differentiable could have been used, we selected the squared deviation between the observed and prcdictcd output<br />

values. Graphically, the network is shown in Figure 1 below.<br />

reject Llnkage Cumulntfuo Back-l’mpagatIpn fictuark OH Ilyperbolic Transfer<br />

z<br />

.,:~~,,;..I..~.j:.(-:.:..<br />

J;;,lx _<br />

.,.:~eLy;..;.‘?, ,: ! :‘, .; ‘.:.:.::?I;: ..:. .i-*a,. .:‘...i ‘!,..<br />

.~.:.~:,~:~.~.~~;.“: ;‘,. ; i


?he dara were scaled to network values between -0.85 and +0.85. Scaling ensured that the neurodes would not<br />

home saturated at the transfer function extremes. When this occurs learning ceases because the gradient of the<br />

error function approaches zero asmytotically. To guard against this, a nominal offset of 0.005 was added to each<br />

derivative. Finally, the leaming coefficient was initially set at 0.9 and gradually reduced as learning progressed.<br />

A momentum term of 0.5 was initially used and also gradually reduced. The learning rule chosen was the<br />

normalized cumulative delta rule’with and epoch size of five hundred.<br />

Results<br />

Once the network was trained on approximately one million random presentations of observations, the network<br />

was then evaluated against the cross-validation sample. The results are presented below.<br />

RZ R<br />

Model I .595 .77<br />

Model II .350 .59<br />

Neural Network ,574 .76<br />

:.<br />

A Chi Square Goodness-of-Fit test was then performed and the hypothesis that the predicted and observed came<br />

from diffcrcnt populations could be rejected at the .95 level of confidence. Notably, the neural network crossvalidated<br />

cocfficicnt practically matches the unvalidated coefficient for Model I.<br />

The results achieved using a neural network to predict job performance were far superior to regression based<br />

approaches when generalizability is considered. The above results provide an impetus to expand the investigation<br />

of neural networks to the Adaptability Screening Project.<br />

Adaptability Screening Project<br />

The Adaptability Screening Profile (ASP) project has been described in detail in previous work (Sellman, 1989;<br />

Trcnt, 1987). Succinctly, the purpose of the project is to: (1) develop a biographic instrument capable of assessing<br />

an individual’s propensity to adapt to military life; (2) determine its operational utility in predicting an individual’s<br />

likelihood of successful completion of an initial term of enlistment: and (3) utililize the instrument as part of an<br />

enlistment screening procedure, Because biodata instruments are assumed to be fakable and/or coachable, as part<br />

of any ultimate implementation, there must be a mechanism to detect and correct for response pattern distortion.<br />

Certainly, this is a difficult task. As reported by Walker (1989), the Army’s previous attempt at large-scale biodata<br />

implementation of the <strong>Military</strong> Applicant Profile (MAP) failed. The failure resulted from several factors, including<br />

lack of an on-going score monitoring system capable of detecting response pattern distortion. For whatever reason,<br />

the MAP validity for predicting attrition fell to zero, that is, it became useless for decision-making about individual<br />

applicants. To prevent a full-scale implementation of ASP from suffering a similar fate, NPRDC directed HumRRO<br />

to develop a score monitoring system. The purpose of the ASP score monitoring system is to: (1) deter faking and<br />

coaching; (2) detect response pattern distortion if and when it occurred; and (3) to estimate the effects of such<br />

distortion so that statistical adjustments could be made to counter the effects of the distortion. Because a more<br />

complete discussion of the score monitoring system is contained in Waters (1989), the following discussion will<br />

concentrate on attacking the distortion problem using neural networks.<br />

Armed Services Aoolicant Profile<br />

The ASAP data base consists of 120,175 applicants to the four Services. Administration occurred during the<br />

Lhrce month period commencing December 1985 and ending February 1986. Of the applicants, 55,675 were<br />

acccsscd. These records form a cohort file which will be appended with additional demographic data elements from<br />

the <strong>Military</strong> Enlistment Processing Reporting System (MEPRS) and the Defense Manpower Data Center Edited<br />

Enlisted Active Duty Master file. Each record will be updated with the inter-Service separation code (ISC) which<br />

will form the basis for 48-month criterion development. Faking/coaching will be simulated by intcnlionally<br />

distorting response patterns to varying degrees.<br />

Chsskal Approach to Response Pattern Disrtortion<br />

One approach to detecting response distortion is to develop a regrcsion based prcdiclion system which rclatcs<br />

background characteristics of applicants with point estimates of ASP score means, variances, skew and kurtosis<br />

indices. Demographic information on race, gender, education, home of record, age, number of dcpcndcnts and many<br />

other variables are available as predictors of ASP score. Accurate prediction would permit analysts of how well<br />

operational ASP data bchavcd as compared with “norming” group data, for the total group as well as subgroups.<br />

28


In attempting to relate ASP score to demographic characteristics, an ordinary least squares regression was run.<br />

The results yielded an ti =.213 and a root mean square error of 9.198. The large standard error is providing the<br />

motivation to determine whether these results can be improved upon using a neural network approach that attempts<br />

to map responses as opposed to total scores.<br />

Neural Network Approach to Distortion Detection<br />

The network paradigm that is currently being investigated is the cumulative backward error propagation<br />

network. The network has fifty outputs representing individual responses. The hidden layer includes 120 ncurodes<br />

and each uses a hyperbolic transfer function. The inputs include the same demographic information that was<br />

hypothesized to be related to the ASP score in the earlier regressions. The network is shown graphically in Figure<br />

2.<br />

:umulativs Backward Error-Prnpagatton Network<br />

Figure 2. Response Pattern Distortion Detection Network.<br />

Due to the number of calculations that are involved in training the above network to recognize response pattern<br />

distortion, a mainframe version of the cumulative backward error propagation neural network has been written for<br />

the IBM 4381 and implemented at the Navy Personnel Research Development Center (NPRDC). The current<br />

implementation is written in FORTRAN 77. Other network paradigms such as Grossberg’s Outstar , counter<br />

propagation, and others are in the process of being added. Although, results arc extremely encouraging, it is<br />

premature to report them at this time.<br />

Summary<br />

CertainIy neural network technology is still in its youth, nevertheless it has experienced significant growth in<br />

recent years and the momentum shows no signs of slowing. Initially, the technology had a “black box” image, but<br />

recent articles such as those by Ho&c and Funahashi demonstrate that neural networks is well founded in<br />

mathematical theory and has statistical roots. That is to say, that a simple ordinary lcast squares regression can bc<br />

expressed as a neural network, albeit a simple one. Neural networks have the potential for providing unique<br />

approaches and insights into, heretofore, intractable problems. In the context of military manpower research, the<br />

jury isn’t still deliberating, because all the evidence has not yet been presented. But when it is, we may find WC<br />

have new answers to old problems.<br />

29


REFERENCES<br />

Funahashi, K., (1989) On the Approximate Realization of Continuous Mappings by Neural Networks. Neural<br />

Networks. Z(3) 183-92.<br />

Homik, I


Hispanics in Navy’s Blue-Collar Civilian Workforce: A Pilot Study1<br />

Jack E. Edwards, Paul Rosenfeld, Patricia J. Thomas<br />

Navy Personnel Research and Development Center<br />

San Diego, CA<br />

The 1964 Civil Rights Act, Title VII mandated equal employment opportunity (EEO) for all persons rcgardlcss<br />

of race, color, creed, national origin, or gender. Congress amended the Civil Rights Act in 1972 to require most<br />

fcdcral agcncics to have programs that would help implement EEO policies. During the quarter of a century.since<br />

the passage of the Civil Rights Act, Blacks, as a group, have made significant inroads into both previously<br />

segregated organizations and segregated jobs within integrated organizations. Hispanics, however, have not been as<br />

successful in attaining employment opportunities.<br />

The Department of the Navy has been’unable to attract Hispanics in proportion to their representation in the<br />

U.S. labor force. In 1980, Hispanic representation in the civilian Navy work force was 3.2% compared to 6.4% in<br />

the total U.S. civilian labor force (CLF). Since 1980, the Navy’s civilian Hispanic rcprcscntation has incrcascd by<br />

only 0.3 percentage points to 3.5% while Hispanics in the CLF have increased 1.8 percentage points to 8.2%<br />

Moreover, the Navy’s 3.5% rate of Hispanic employment in civilian positions lags behind Hispanic representation<br />

rates of the Air Force (9.5%), Army (5.0%). and other federal agenicies (5.2%) (Secretary of the Navy,<br />

memorandum of 16 May 1989). Given projections that by the year 2000 Hispanics will constitute nearly 11% of the<br />

total U.S. population (Koretz, 1989), it is clear that the Navy needs to “intensify efforts to increase the number of<br />

Hispanics in the civilian work force” (Secretary of the Navy, memorandum of 16 May 1989).<br />

The underutilization of Hispanics, the projections of dramatic Hispanic population growth, and the potential<br />

benefits to the Navy of greater Hispanic rcprcsentation attest to the need for focused research on the Hispanic undcrrepresentation<br />

problem. An initial step toward the better utilization of this valuable human resource is to identify<br />

the barriers that have prevented Hispanics from obtaining parity in the work place. Toward this end, the Navy<br />

instituted a four-year EEO Enbancemcnt Research Project to increase Hispanics’ opportunities for employment<br />

parity. Previous project work has focused on the difficulties of accurately defining the Hispanic underrcprescntation<br />

problem (Edwards & Thomas, 1989; Thomas, 1987), a literature review on the relationships of attitudes and<br />

demographics to work outcomes (Edwards, 1988), and the geographic mobility of Hispanics for employment<br />

(Edwards, Thomas, Rosenfeld, & Bowers, 1989).<br />

Although Navy-related studies of Hispanics have been rare, one previous intensive research effort was<br />

concerned with the barriers faced by Hispanic Navy recruits (cf., Triandis, 1985). In a summary report of their<br />

Navy-funded studies, Triandis (1985) noted that he and his colleagues had found more similarities than diffcrenccs<br />

in comparisons among Hispanic, Black, and Anglo recruits. Triandis suggested that Hispanic Navy recruits of the<br />

early 1980s were not typical of Hispanics in the general population. In several reports, Triandis and colleagues<br />

argued that their research participants were so acculturated as to be indistinguishable from the mainstream of<br />

American culture. An‘important job-related component of acculturation is the ability to communicate in English.<br />

The National Commission on Employment Policy (1982) noted that poor English skills and lack of education arc<br />

two major reasons for Hispanic labor-market difficulties.<br />

Acculturation should be considered when determining whether Hispanic employees are different from their<br />

Anglo peers. Consideration of acculturation is also important in determining whether an organization is recruiting<br />

from the full Hispanic population or only from an acculturated portion as Triandis (1985) suggested. A need exists<br />

to determine whether there are differences among the Navy’s acculturated Hispanics, ~SS acculturated Hispanics,<br />

and Anglo majority group in its civilian workforce.<br />

‘The opinions expressed in this manuscript are those of the authors. They are not official and do not represent the<br />

views of the Navy Department. The authors gratefully acknowledge the assistance of Luis Joseph, Jerome Bower<br />

and Walt Peterson.<br />

31


.<br />

Met hod<br />

Samwle<br />

vrecruits. The sample was selected from newly hired men in semi-skilled or journey-person jobs as<br />

Department of Navy craftsmen, mechanics, operatives or service workers at 14 Navy activities in the contincnal<br />

United States. Each Hispanic male who entered one of the jobs was asked to voluntarily complete a questionnaire<br />

during his first week of work. A comparison Anglo male was also surveyed whenever his entry into a similar job at<br />

the same activity followed the entry of a surveyed Hispanic male.<br />

Resnondents. Six of the 160 completed questionnaires were discarded because the persons who idcntificd<br />

themselves as Hispanic indicated that either (a) his primary language was something other than English or Spanish<br />

or (b) his country of origin (e.g., Lebanon) was not such that findings from those individuals would generalize to<br />

persons from more commonly identified Hispanic lands. The surveys for three additional Hispanics could not be<br />

used because the participants did not supply responses to the acculturation index. As a result, 76 Hispanic and 75<br />

Anglo surveys were analyzed.<br />

Survey Instrument<br />

The questionnaire contained 111 items some of which were included as part of a longitudinal study. Results<br />

pertaining to only four of the categories: demographics, acculturation, need for clarity, and potential factors<br />

considered when taking a job, are reviewed in this paper. A pre-test of the survey determined that it could bc<br />

completed in less than 30 minutes. The average readability of the questionnaire was below the sixth grade rcading<br />

level.<br />

Acculturation. The four-item acculturation scale was pattcmcd after Kuvlcsky and Patella’s (1971) five-item,<br />

ethnic-identification scale. Respondents indicated how frequently they used a language other than English when<br />

they talked to family members, talked to friends, read a newspaper, or listened to a radio or TV. The anchors for the<br />

rating scale were never (I), almost never (2), sometimes (3), usuallv (4), and always (5).<br />

Need for clarity. Lyon’s (1971) four-item, need-for-clarity index asked respondents how important it was to<br />

know in detail: what is to be done, how the job is supposed to be done, the limits of the respondent’s authority, and<br />

how well the respondent is doing. Respondents completed the need-for-clarity items using the following rating<br />

format: not imwortant (l), neither unimoortant nor important (2), somewhat imoortant (3). important (4). and !&ty<br />

imnortant (5). Respondents were also given the option of indicating that an item was not true (0); such answers<br />

were treated as missing data.<br />

Potential factors considered when taking a job. Four types of factors were investigated: importance of jobrelated<br />

factors, work-group composition, sources of recruitment, and job-search activities.<br />

Procedure<br />

Definine Hiswanic acculturation Prouns. The Hispanic respondents were grouped into high a = 35) and low @<br />

= 41) acculturation groups based upon their responses to the four-item scale. For all analyses, respondents whose<br />

mean acculturation scores were 2.00 or less (i.e., the respondents who a or almost never used Spanish) were<br />

classified as high acculturation Hispanics (HAHs): the remainder of the Hispanic respondents were classified as low<br />

acculturation Hispanics (LAHs).<br />

Analvses. Whenever percentages are shown in a table, a chi-square test of independence was conducted to<br />

examine whether a relationship existed between group membership (Anglos, HAH, and LAH) and responses to an<br />

item or a composite. Whenever means are shown, a one-way analysis of variance (ANOVA) was perform@ with<br />

group membership as the independent variable and an item response or a composite as the dependent variable. A<br />

significant ANOVA result was followed by a Scheffe post hoc test to determine the source(s) of the difference. For<br />

all primary and secondary analyses, the probability level was set at .Ol. This significance level was chosen as a<br />

balance for three considerations: the exploratory nature of the research, the huge number of contrasts performed,<br />

and the already low statistical power caused by the sample sizes.<br />

Results and Discussion<br />

Dcmoeranhics<br />

In general, the Anglo and Hispanic groups were very similar (see Table 1). All three groups averaged about 34<br />

years of age, more than 12 years of education, and approximately 17 years of working for pay. Almost all of the<br />

respondents reported that they had been employed previously on a full-time basis and that they were not currently<br />

members of a union. The members of each group averaged similar amounts of time (between 4.50 and 6.75 years)<br />

in their last full-time job.<br />

32


34.81 33.60 34.00<br />

12.60 12.54 12.28<br />

17.92 16.69 17.16<br />

1.4% 2.9% 10.0%<br />

6.64 4.59 6.66<br />

40.0% 34.3% 48.8%<br />

9.1% 16.7% 16.7%<br />

20.0% 22.9% 37.5%<br />

Table 1<br />

Demographics<br />

4. Age (Mean number of years)<br />

5. What is the highest grade you completed in school or college? Count a<br />

GED as 12 years.<br />

6. Since you became 16, how many years have you worked for pay?<br />

56. Is this your first full-time job? (Answered “Yes”)<br />

If “No” how long were you employed full time in your last job? (years)<br />

10. Are you a veteran? (Answered “No”)<br />

11. Are you a member of a union? (Answered “Yes”)<br />

12. Have you worked for the Navy in some other civilian jobs? (Answcrcd<br />

” yes”)<br />

Two interesting but non-significant differences were observed. Compared to both Anglos and HAHs, a larger<br />

proportion of the LAHs reported having worked in other civilian Navy jobs. Second, 65.7% of the HAHs wcrc<br />

veterans. That proportion is higher than either the 60.0% for Anglos or the 5 1.2% for the LAHs.<br />

The overall similarity of the three groups with regard to demographics both clarifies and cautions the<br />

interpretation of subsequent findings. The similarity weakens any argument that demographic differences wcrc at<br />

least partially responsible for any subsequent difference among the groups. For example, the similarity with regard<br />

to veteran status lessens the possibility that the additional points awarded to veterans would differentially affect the<br />

time between application and employment for one or more groups. Still, caution must be cxcrcised in the<br />

interpretation of these and subsequent findings. One reason for caution is the atypicality of the Hispanics in this<br />

sample with regard to education. The Census Bureau (U. S. Department of Commerce, September 7, 1988) reported<br />

that 51% of all Hispanics aged 25 and above had completed high school and/or college during 1987 and 1988.<br />

Although this is an all-time high for Hispanics, it is still markedly lower than the 78% completion rate for non-<br />

Hispanics. Therefore, even though the three groups in this study are similar in terms of education, this study’s<br />

Hispanic sample is different from the Hispanic population. Second, conclusions are tenuous because of the small<br />

sample and low statistical power.<br />

Need for Clarity<br />

All three groups indicated a very high need for clarity, with LAHs reporting the highest need for clarity. The<br />

need-for-clarity scale mean for LAHs (4.72) was significantly higher than the mean for Anglos (4.33) and<br />

nonsignificantly higher than that of HAHs (4.49). The situation in the Hispanic population may be more extreme<br />

than implied by that small difference. The lower education level of the Hispanic population, in comparison to the<br />

sample participating in the present study, may result in yet more need for clarity by less-educated Hispanics.<br />

Gould (1982, p. 97) cited several studies that have shown that “Mexican-Americans do not tolerate ambiguity<br />

and uncertainty well”. The strong authoritarian role of fathers and emphases on sex roles and discipline in such<br />

families were suggested as possible reasons for Gould’s findings. The significant need-for-clarity difference found<br />

in this study also supports Ash, Levine, and Edgell’s (1979) finding that when given a chance to choose tasks,<br />

Hispanic (more so than Black or Anglo) job applicants disproportionately indicated a preference for jobs in which<br />

others would tell them what to do next.<br />

Potential Factors To Be Considered When Takine a Job<br />

Importance of iob-related factors. Table 2 shows the mean ratings for each group for each of the 10 factors. In<br />

addition to all three groups evaluating each factor at essentially the same level of importance, the average ratings for<br />

the factors showed the same pattcm across the three groups. The 10 Anglo means correlated .93 @ < .OOl) with the<br />

10 corresponding HAH means and .94 (p < .OOl) with the 10 LAH means. The HAH and LAH means correlated .84<br />

(r! < .OOl). The most important factor for Anglos and HAHs, and nearly the most important factor for LAHs, was<br />

the job security provided by the government. These findings show that all three groups valued the same rewards<br />

and outcomes and that the average value placed on any factor did not vary by group when ethnicity and<br />

acculturation were examined.<br />

33


.<br />

Anglo HAC &g g&g<br />

4.00 4.48 4.33 41.<br />

3.98 4.00 4.12 48.<br />

3.97 4.23 4.37 46.<br />

3.93 4.29 4.38 45.<br />

. 3.83 4.20 4.22 42.<br />

3.75 3.65 4.28 43.<br />

3.74 4.12 4.23 40.<br />

3.65 4.03 4.17 47.<br />

2.93 3.48 3.05 44.<br />

2.33 2.24 3.04 49.<br />

13.78 12.51 14.24<br />

4.64 3.15 4.08<br />

Table 2<br />

Potential Factors to Be Considered When Taking a Job<br />

Importance of Job-Related Factors<br />

Working for the government provides a lot of job security.<br />

I think the job will be interesting or challenging.<br />

The government provides EEO for promotions, training, etc.<br />

Benefits (time off, health ins., etc.) are good.<br />

The pay is good.<br />

The hours of my work schedule arc good.<br />

I badly need a job.<br />

I can learn a new skill.<br />

I don’t have to drive too far or can take a bus.<br />

I’have friends or relatives working here.<br />

Work-Group Size Preferences<br />

65. What size group would you like to work in? That is, how many pcoplc,<br />

counting yourself, would you like your boss to supervise?<br />

66. Imagine you were working with 10 other people everyday. How many of<br />

those people would you like to be of your race and ethnic group?<br />

Recruitment: How did you find out about this job?<br />

% Indicating Source@ (Place an “X” by as many answers as apply and write in the information<br />

asked.)<br />

48.6% 42.9% 56.1% 17. From a friend or relative<br />

21.6% 22.9% 12.2% 16. Federal job listing<br />

12.2% 11.4% 14.6% 15. Newspaper ad<br />

10.8% 11.4% 14.6% 22. Employment office or program<br />

10.8% 17.1% 12.2% 23. Other<br />

2.7% 0.0% 0.0% 21. School counselor or training program<br />

2.7% 0.0% 0.0% 19. I was a trainee or intern for this job.<br />

1.4% 0.0% 7.3% 18. From the union<br />

1.4% 2.9% 12.2% 20. EEO office<br />

.’ Job Search<br />

3.22 2.21 3.31 57. How many months passed between the final day of work on your last fuhtime<br />

job and your firs& day at work on this Navy job?<br />

5.02 3.60 4.23 58. How many months did it take from the time you filed your application for<br />

this job and your first day of work.<br />

2.44 4.00 3.89 59. How many times during the last 3 months did you check the Federal<br />

govcmmcnt job listings?<br />

1.35 1.45 I.97 60. During the last 12 months, how many Federal govcmment jobs did you<br />

apply for?<br />

4.47 3.26 4.02 61. During the last 12 months, how many other jobs did you apply for?<br />

Nole: @ The totals for rhe Recruitment columns we greater rhan 100% because respondenls could indicate more<br />

rhim one source.<br />

34<br />

.


Work-g;rouD comoosition. The average dcsircd number of persons sharing the respondent’s race/cthnicity was<br />

the same across the three groups (see Table 2). On average, Anglos dcsircd to work in groups that were 46.4%<br />

Anglos; HAHs, 31.5% Hispanics; and LAHs, 40.8% Hispanics.<br />

Given that less than 10% of the current U.S. population is Hispanic, the average desirable composition of the<br />

work groups for Hispanics may be unobtainable (even in locations such as those in this study that exceeded Ihc<br />

current national average). Furthermore, assigning a disproportionately high number of Hispanics to the same work<br />

group could result in segregated work groups and open an organization to discrimination complaints.<br />

Sources of recruitment. Nine chi-square tests of independence found no significant relationship bctwcen group<br />

membership and method of recruitment (see Table 2). Nearly half of all the respondents indicated that they found<br />

their jobs through a friend or relative. Because there are proportionally a great many more Anglos than members of<br />

other ethnic/racial groups working for the Navy and because the Navy already suffers from Hispanic<br />

underrepresentation, continued reliance on this recruitment method may perpetuate the current representation<br />

problems. Also noteworthy is the fact that so few persons were recruited by employment and EEO offices.<br />

Affirmative action recruitment apparently was not being done or at least was not being done effectively.<br />

Job search. Group means for the months spent getting the current job and the activeness with which the newly<br />

hired employees were previously pursuing kmployment opportunities are shown in Table 2. The short time bctwccn<br />

leaving a previous full-time job and obtaining employment with the Navy suggests that many of the newly hired<br />

employees from all three groups were working elsewhere until the time that they were hired by the Navy. For the<br />

olher non-significant difference for a time-related variable, both Hispanic groups were, on average, marginally<br />

faster than Anglos in obtaining their new jobs. Together, these time-based questions seem to indicate that Hispanics<br />

and Anglos are being treated equally during the hiring phase whenever they have similar job-related demographic<br />

chtiacteristics such as education and veteran’s preference.<br />

No ethnic or acculturation difference was detected for the three items measuring how actively the respondents<br />

were seeking their jobs. During the year prior to completion of the survey, the average number of jobs applied for<br />

was 6.00 or less for all three groups.<br />

Conclusions and Recommendations<br />

A goal of the present study was to identify factors among newly hired personnel that might help to explain tic<br />

reasons for Hispanic underrepresentation in the Navy’s blue-collar civilian work force. Overall, the results indicate<br />

Lhat both high- and low-acculturated Hispanics were more similar to Anglos than they were different. These<br />

similarities were obtained for both demographic variables and factors potentially influencing decisions to take a new<br />

postion. Echoing Triandis’ (1985) findings with Hispanic Navy recruits, the results of the present study indicate that<br />

the Navy is attracting Hispanics into its blue-collar workforce who are indistinguishable on a variety of dimensions<br />

from the majority (Anglo) group. As research on Hispanics in work settings continues to grow (e.g., Knouse,<br />

Rosenfeld, & Culbertson, in preparation), it will be of interest to see whether Hispanics entering other govemmenl<br />

and private-sector organizational settings are likewise similar to Anglos on key psychological and organizational<br />

dimensions. If indeed these Hispanics are, then organizations may need to refocus their efforts to attract those<br />

individuals whose characteristics are more reflective of the Hispanic population rather than a subgroup who arc<br />

indistinguishable from Anglos.<br />

This investigation did, however, reveal one organi;lational practice (recruitment) and one individual-diffcrcncc<br />

variable (need for clarity) that could be contributing to the lack of parity for Hispanics. The following interventions<br />

are suggested for dealing with those issues.<br />

1, Usemore An investment<br />

in formal recruitment (e.g., advertisements and job fairs designed especially for Hispanic communities) could ease<br />

future recruitment costs as Hispanic numbers continue to increase. If no change in recruitment procedure occurs,<br />

these findings suggest that the Navy will continue to experience non-parity for Hispanics. The Office of Pcrsonncl<br />

Management’s recently formed “Project Partnership”, an alliance with the Hispanic <strong>Association</strong> of Colleges and<br />

Universities and National Image, Inc., may prove useful as a means of increasing the number of Hispanics<br />

recruited(Weeklv Federal Emnlovees News Digest, March 19,199O).<br />

2.Enhance<br />

clarity. The Navy already has the required vehicle for implementing such training in the form of supervisory EEO<br />

training sessions. Supervisors could be presented with (a) methods for structuring tasks and duties and (b) the<br />

35<br />

.


.<br />

processes used in mentoring. While these intcrvcntions may be specifically designed to aid less acculturated<br />

Hispanics, they also can help employees from other ethnic and racial groups.<br />

References<br />

Ash, R. A., Levine, E. L., & Edgell, S. L. (1979). Exploratory study of a matching approach to personnel sclcction:<br />

The impact of ethnicity. Journal of ADDlied Psvchology, @, 354 I.<br />

Bumam, M.A., Telles, C.A., Kamo, M., Hough R.L., & Escobar, J.I. (1987). Measurement of acculturation in a<br />

community population of Mexican Americans. Hisoanic Journal of Behavioral Sciences, 9, 105-130.<br />

Edwards, J. E. (1988). Work outcomes as nredicted by attitudes and demogTaohics of ‘Hisoanics and nonHisDanics:<br />

A literature review (NPRDC Tech. Note 88-23). San Diego, CA: Navy Personnel Research and Devclopmcnt<br />

Center.<br />

Edwards, J. E., & Thomas, P. J. (1989>6 Hispanics: When has equal employment been achieved? Pcrsonncl<br />

Journal. 68, 144, 147-149.<br />

Edwards, J. E., Thomas, P. J., Rosenfeld, P., & Bower, J. L. (1989, August). Movine for emnlovmcnt: Are<br />

Hispanics less geogranhicnllv mobile than AnPlos and Blacks.3 Paper presented at the meeting of the Academy<br />

of Management, Washington, DC.<br />

Gould, S. (1982). Correlates of career progression among Mexican-American college graduates. Journal of<br />

Vocational Behavior, 3.93-l 10.<br />

Knouse. S.B., Rosenfeld, P., & Culbcrtson, A. (Eds.). (in preparation). Hispanics and work. Ncwbury Park, CA:<br />

Sage.<br />

Koretz, G. (1989, February 20). How the Hispanic population boom will hit the work force Business Week, 21.<br />

Kuvlesky, W. P., & Patella, V. M. (1971). Degree of cthnicity and aspirations for upward social mobility among<br />

Mexican American youth. Journal of Vocational Behavior, 1,231-244.<br />

Lyons, T. F. (1971). Role clarity, need for clarity, satisfaction, tension, and withdrawal. Organizational Behavior<br />

and Human Pcrformancc, 4,99-l 10.<br />

Marin, G., Sabogal, F., Marin, B. V., Gtero-Sabogal, R., & Perez-Stable, E. J. (1987). Development of a short<br />

acculturation scale for Hispanics. Hisnanic Journal of Behavioral Sciences, &183-205.<br />

National Commission on Employment Policy. (1982). Hisnanics and iobs: Barriers to nrotzrams. Washington, DC:<br />

Author.<br />

Rojas, L. A. 1982. Salient mainstream and Hisnanic values in a Navy training environment: An anlhroDolocical<br />

descrintion (Tech. Rep. No. ONR-22). Champaign, IL: University of Illinois, Department of Psychology.<br />

Secretary of the Navy (1989, May 16). Memorandum on HisDanic EIIIDlOYment.<br />

Thomas, P. J. (1987). Hisoanic underreoresentation in the Navy’s civilian work force: Definine the Droblcm (Tech.<br />

Note No. TN 87-31). San Diego, CA: Navy Personnel Research and Development Center.<br />

Triandis, H. C. (1985). An examination of Hispanic and peneral nonulation DerCCDtiOnS of OrmkatiOnai<br />

environments: Final reoort to the Office of Naval Research. Champaign, IL: University of Illinois,<br />

Department of Psychology.<br />

U. S. Department of Commerce, Bureau of the Census. (1988, September 7). Hisoanic educational attainment<br />

highest ever, Census Bureau Reoorts. Press Release from United States Department of Commerce News,<br />

Bureau of the Census.<br />

U. S. Department of Commerce, Bureau of the Census. (1985). Persons of Soanish Origin in the United States:<br />

March 1985 (Advance Report). Washington, DC: U. S. Government Printing Oflice.<br />

Weekly Fe&al EmDlOvecs News Digest. (1990, March 19). p. 4.<br />

36<br />

.


1 . INTRODKTICi?<br />

DESCRIPTORS OF JOB SPECIALIZATION<br />

BASED ON JOB KNOWLEDGE TESTS<br />

by<br />

C. Lee Walker, Omnibus Technical Services<br />

Jeffery A. Cantor, Lehman College, CUNY<br />

This study was un&rtaken to detxmine if job knowledge t2s.t and training<br />

history data could be usad to define a billet sutstructurs riflectirq specialization<br />

on certain equipments for which a Navy Enlisted Classification<br />

(NEC) was responsible. It was hypothesized that such specialization, if<br />

existing, I:ould be recognized by score patterns in System Achievement<br />

TCX6 (SATs) and by related patterns in the use of advanced training.<br />

Specinlizat.ion, thus id5ntif ied , could be confirmed by limit.& fleet sur-<br />

veying. The investigation produced data which suggested specialization but<br />

more particularly it produced tours e use patterns and course/SAT score r2lationships<br />

which provide insight into the way ships use advancad training<br />

to support readiness. This paper presents information on the methodology<br />

thus derived with the hope that it will provide a point of departure for<br />

other parsons faced with devaloping training analysis methodologies.<br />

2. APPROACH<br />

The Poseidon Fire Control Technician (NEC 3303) was chosen as the subject<br />

for the investigation &cause with six members per crew it provided an adequate<br />

population for developing a methodology that could then be applied<br />

to to larger populations. lWo basic methods of investigation were used:<br />

(1) Reviews of Personnel and Training program Evaluation Program(PT2P)<br />

Personnal Data ,Qstern (PDS) data and (2) discussions with training petty<br />

officers. The PDS review data was used as a guide in the discussions With<br />

fleet personnel.<br />

37


2 .1 PERSONNEL DATA SYSTEN INFORMATION. Scores, course attendance, and<br />

duty stations were extracted from the PDJc system for all FTB 3303 pereonnel<br />

.<br />

2.1.1 Studs PopulatiorA. The study hypclthesis required that the records of<br />

personnel be reviewed for events or changes occurring over the course.of a<br />

person’ 6 career in order to determine at what point in service spccializa-<br />

tion on an equipment or gqoups of equipments took plac?. For the study,<br />

this speciaiization or sub structure was of interest for personnel in sub-<br />

marine crsws . These comprise the bulk of the NEC. because of th* continu-<br />

ing evolution of equipment, training and measurement, some time limit<br />

needed to applied to the data used so that analysis results would be rel-<br />

evant . tiuilding on these general requirements three criteria were cstab-<br />

1iShtd for selection of records for analysis. For each record s~lcct& ti<br />

person h&d t0 have :<br />

a . Graduated from “0” school after the proscribed start date.<br />

b. Reported to the crew of an operating SSbN directly after “C’ school.<br />

c . Taken four or mora SATs .<br />

Review of data extracted indicated that few personnel in paygrade E-4 had<br />

sufficient SAT records to be useful for analysis. Personnel in pbygrtide E-<br />

6 had duty station histories which made a “cause-effect ” analysis of<br />

school racords not very uscf ul . It. was tl-ieref ore determined to use a sur-<br />

vey population of persons serving in paygrade E-5. Ninety-six persons in<br />

paygradc E-5 met the criteria and all were used in the study. preliminary<br />

analysis was done by randomly dividing the the population into two equal<br />

groups to look for consistency of results. Thi study methGdology was also<br />

applied to 16 persons in paygrade E-6 meeting similar critsria to ensure<br />

thbt no discontinuities were introduced by limiting the population to piygrade<br />

E-5.<br />

38


c:- TORc OF JOB SPEr:aON RAm ON JOS KNOWLEIXE TESTING<br />

IL crder LO analyze for the point in a persons career at which respcneibility<br />

for or expertise on a particular equipment was achieved, events of<br />

interest were assigned to relative Time Groups. The Time Groups were based<br />

on patrol Gycl es after graduation f ram Vu School. Schools or SAT scores<br />

were recorded for each individual in the patrol cycle sequence to which<br />

they belonged rather than being assigned on a calendar year basis.<br />

Since some records reflected more patrol cycles than others, the part of<br />

the survey population still present diminished in the later Time Groups.<br />

Table 1, shows the Time Groups used and the records with informatiax, for<br />

each Time Group.<br />

Table 1. Timi: Grout, Pooulation<br />

- - --<br />

Time GL-GUP 1 2 3 4 5 6 7 5<br />

PoJuiat ion 96 96 56 96 67 45 I 29 _ -- 11 l<br />

The events, 1, e, , sco’res, schools, were then summarized based on the Time<br />

Group into which they fell.<br />

2.1.3. ,sE?‘J’ St:ore Q.-J- lty.qiq . The SATs us& for this aIia&SiS were administxed<br />

upon completion of “CJt Schoo14’ and during each SSBP? “Off Crew” period.<br />

the tests are broken into several equipment dependent areas. Test<br />

versions are changed every five months with occasional changes in the number<br />

and size of areas between test versions. The score analysis was bses<br />

on test areas with scores recorded in their appropriate Time Groups. Since<br />

individuals entered the system at different times, results from several<br />

test versions were included in each Time Group. To dampen the difference<br />

between test versions normalized scores were used as the basis of anaiyais.<br />

It was hypothesized that specialization related to a billet sub6tructure<br />

should be ref lect-ed in higher SAT scores in an area. This<br />

39<br />

.


1~sp2cializat iori”scc~rc el?vat ion should occur in a w&y to lx idantifisd<br />

from the normal increase in scorers<br />

associated With time. Both overall SAT scores and scores one standard dz-<br />

viation above the mean (r&I) were examined as possible indicators. Scores<br />

of 60 or greater, cotiJiIied with other training related data were selected<br />

as the most sarviceabls indicators of specialization.<br />

2.1.4. Course .2na3v.=is. Attendance at advanced FTE! rating training courses<br />

was analyzed. Course attendance was recorded in its appropriate Time Group<br />

and also identified in relgtion to specific SAT administrations.<br />

2.1.5. Score., Course, and Experi+-nre In&stors . Many relationships be-<br />

tween scores, courses,and time were examined as possible indicators of<br />

speciaiizstion , It was felt that the indictors used should be straightforward<br />

to derive and easy to interpret . The following paragraphs detail<br />

the indicators chosen and their role in interpretation.<br />

2.1.5.1. Scores Equal to or Greater than Sixty (60s) . This is a count for<br />

either grk:lupa or test areas of the number of scores above 60 occurring.<br />

Scores over 60 in any Time Group which differ greatly from those expected<br />

in a normal distribution suggest that specialization is occurring at. that.<br />

point. Conversely, a less than expected number of 60s suggest limited employm~nt<br />

in %hat area during the Time Group in question.<br />

2. i .s .2. krScJilS P.eceiving Scores Equal to or Greater Than Sixty (6OP).<br />

This is a count of the persons receiving scores equal to or greater than<br />

60. Each person i s counted only U-12 first %ime he recaives a 60 ~:br<br />

greater. This is a means of determining if the same or diffcr?nt people<br />

are yett ing the high scores.<br />

.<br />

2 . 1 . fi . 3 . kmber of Sixties per Person (6096OP) . This is a ratio of thi<br />

number of scores of 60 or above to the people getting the ScoreE. The runher<br />

must always be one or greater, with higher numbers indicating mart<br />

repetition of 60 scores. Repetition suggests continuing on the job r?in-<br />

Iorcement .<br />

40


2.1.5.4. Pcrcelitage of Persons Rxeiving a Score of Sixty or Greater<br />

(6OPS) . This shows for each Time Group the percentage of the survey population<br />

receiving a actjre of GO or greater. It is used in addit i3i to the<br />

actual number of persons (6OP) receiving a score of 60 or greater to en-<br />

6blc comparison in the higher Time Groups where the population drops off.<br />

2.1.5.5. Persons Receiving a Score of 60 or Greater on the First SAT T&en<br />

after Advanced Training. TVo indicators were developed based on the test<br />

perf crmance of personnel on the first. SAT following advanced training.<br />

These relate the high scores to the number of people being trained and<br />

t-he scores of trained people to the ~erall high scores. Figure 2. is a<br />

modified Vann diagram depicting the relationship of the- indicators.<br />

2.1 .5.5.1. Number of. Persons Receiving a Score of Sixty or Greater on the<br />

First SAT Takers after Advanced Training Relative to the Numbx +f Pxaons<br />

Attending Advanced Training(Sc60/ScP) , This is an indication of the relationsliip<br />

between the advanced course nnd the SAT area. A large numbir suggests<br />

a close content relationship between the course and the SAT.<br />

2.1.5.5 '3<br />

. . * Number of Persons Receiving a Score of Sixty or Greater on the<br />

First SAT Taken after At.t.ending A&&riced Training Relative t-0 Th2 Num&r<br />

of Persons Receiving a Score of Sixty or Greater. (Sc6OIFjOP) . This is an<br />

indication I:Jf the effect of schools on performance as measured by SATs .<br />

When using 60s as an indicator of specialization it is important to kLiO’vJ<br />

if the high scores tire strongly school influenced or if they reflect pri -<br />

marily work experience.<br />

3 . FINDIPJGS<br />

The initial focus of this study was on the ident if icac ion of equipment<br />

specialization within an NEC. The research, in addition, yielded data oh<br />

school ut i 1 i zat ixi and per sonnel per-f ormance which, although not dircsctly<br />

related to specialization, provide important insights on the training sys.tern,<br />

FirLtings in all t.hree areas have been included in the report.<br />

3.1 SPECIALIZATION. Specialists are persons specifically respxsiblc, or<br />

ioc;ked to, for operaLion or maintenance of some limitcti part of the entire<br />

41<br />

.


- C = sc61)<br />

A 60P<br />

SCP %<br />

60P %<br />

O-<br />

.+<br />

- C = sc6o<br />

B SCP<br />

Where A = The number of persons receiving a score of 60 or greater (60P)<br />

B = The number of persons attending advanced training (ScP)<br />

C = The number of persons who both attend advanced training and<br />

receive a score of 60 or greater. In deriving C the further<br />

qualification was placed that the score of 60 be on the first<br />

SAT following advanced training (Sc60)<br />

Figure 2. Relationship of Indicators Based on Scores After School


-----<br />

DESCRIPTORS OF JOE SPECIALIZATION BASED ON JOB ~OblJnEJXZE TIXW<br />

NEC respons ibi 1 ity . The existence of specialization was identified from<br />

perSanne1 data and was confirmed by personnel survey. Speci&ization may<br />

relate to either time or ability.<br />

3.1.1, :: ,e$‘- ‘7- .‘o):es~ec:t. to Time. Specialization with respect to<br />

time mean5 that certain equipmant.s or areas become t,h2 r:espms ibilit,y ,:,,f .<br />

technicians primarily based on experience level. In this type of specialization<br />

mwt people with similar experience levels can be expected to be<br />

assigned responsibility for a specific equipment. In general. time spe-<br />

,<br />

oiali zat ion breaks di~wn into equipments assigned to newly reported personi<br />

nel and some reserved for highly emerisnced personnel.<br />

I<br />

I<br />

3.1.2. S~~eoialiZ~t.ion with Resmect. t.o Abi1it.v. This type of Specialization<br />

;<br />

. ref lccts~ 36m12 special apt ituda. ,q2ecialization of this type does not re-<br />

1<br />

3<br />

fl5ct a spxific a of ability that leads a person to &come in-<br />

. . f<br />

volveil in most corriccive maintenance in a.n ares. ,~ecialiZatiorl of this<br />

nature will begin as soon as the ability is recognized and cant inue<br />

throughout a person s tenure with the command.<br />

3.1.3. ]qo SrxciaZation. Areas with no specialization may be worked by<br />

any technician without recourse to Specialists. These are areas. that are<br />

either SimLlI e eIilm$-J or frequently t3imcJh worked that a SUf f ici~lit Ckgree<br />

of competence can be expected and utilized in all technicians.<br />

3.2. COURSE AT-lXI?DANCE. There were thirteen advanced courses applicable<br />

to the WC. Very few people attend all courses and attendance is cenc5ntraced<br />

in the first part of a technician’s first sea duty tour. Three dist<br />

inct patterns of attendant r= _ may be derived from the various courses.<br />

3.2.1 u Att_endance. These courses showed an attendance pattern with<br />

very hsavy usage prior to and following cha first patrol and then dirniniShin~<br />

rapidly tG little: or no use,<br />

43


.<br />

DE.,,RIPTORS<br />

cr c c .J?<br />

3 .2.2. Normal Xt?n&nce. This attendance pattern begins with little attendance<br />

prior to the first patrol, peaks during the second or third off<br />

crew, and thin tapers off slowly.<br />

3.2.3. L~ol Attendance. This attendance pattern shows relatively steady<br />

attendance over five or six off crews, Each of the courses that had level<br />

attendant+ also had below average attendance.<br />

3 .3 . FERSCjiWEL FEF$~jP~QKE, Personnel performance as measur+~ by the cho-<br />

sen indicators varied greatly between equipments.<br />

3.3.1 Ferc+:ntaac of Persons Peceiving a Score of Sixtv or Greater f:n the<br />

Test Following Advanced Trainina. This relationship varied from a low of<br />

17% to a high of 40 %. This largely reflects the relationship between the<br />

SAT and the course.<br />

3 .3.2.Qof + -o.- - e re’ ’ S a $A.T ScorP _ cf 7 i,r.t-v ok<br />

De .e w‘q lX Sn Following F.dv~ ted Trainln(L<br />

I .<br />

This relationship goes from<br />

a 1,:~ ,:if 21s;; tt] a high of 57%. The average value was 34% Which implies<br />

that most persons achieving high scores have not been directly influenced<br />

by advanced training.<br />

4. EXAMPLES<br />

Thirteen courses were analyzed as part of the study. Diagrams and discus-<br />

sion are provided on five courses. Two courses show specialization with<br />

respect tcl t.ime and more particularly use early in an assignment, TM0<br />

courses shots no strong evidence of specialization. One course suggests<br />

specialization with respect. to ability. For each a rectangular Venn diagram<br />

and tabular data are provided on the u s ixc ies ” related indicators.<br />

Th* tabular data shows tht averagt value in each category for all thirteer,<br />

courses in the study and the value for the particular course or area. In<br />

addition t.here is A graphical presentation of school attendance (Cl d nd<br />

” sixty 1) scores (60) with respect to time periods. The diagrams and accompanying<br />

di:.cussion are each presented on a separate palge.<br />

44<br />

.


coum? I<br />

1.1 TIME SPECIALI7,TION, COURSE 1. Specialization with respect tG time iz<br />

indicated by a high initial schooling rate dropping ‘of rapidly and the<br />

number of high scores occurring during the 2nd, 3rd, and 4th time periods.<br />

The percentage of psople getting high scores following schooling, the relative<br />

shapes Of the “CU and ‘60” curves and the low repetition rate of<br />

high scores all indicate that experience is of greater importance to high<br />

scores than school. The low score repetition rate suggests that people are<br />

rotated through responsibility for this equipment and receive little reinforcement<br />

on the equipment when not specifically assigned. Interviews<br />

wir;h fleet .zp.zrienced personnel confirmed that this equipment is geherally<br />

assigned to nzw personnel as a place to get them gently Started.<br />

45


El SCP % I I sceo w i i l cJOP%<br />

I 2 3 4 5 b 7 6<br />

I 96 96 96 9~ 61 49 19 II<br />

~ouolArotn~~~ wnod?<br />

4.2 TIM?, SPECIALIZATION, COURSE 2 t This is ix-1 area of QtXialization for<br />

newly reported personnel. This is shcwn by the high initial schooling rate<br />

and the number of “60” scores obtained during the 2nd time period. At 72%<br />

this is one of the most used courses by the NEC, however, high scores fol-<br />

lowing the course occur at little b&tsr than the chance rate. Most of the<br />

high scores rcf lect experience not schooling. The high score repetition<br />

rats of 1 .50 is the lowest of any of the thirteen areas in the study and<br />

suggests thic spacialization is short lived and that continuing reinforcement<br />

in later patrols is not present.<br />

46


DESCRIPTORS OF JOR SPECIEJ,IZATIaN f&SG!n C?kJ JOR KNOWJXDGE TESTI=<br />

r<br />

SCPX<br />

SC00<br />

scp<br />

Q@ g@<br />

6GP 6GP<br />

BUP%<br />

*vrg 59 28 1.78 .34 .48<br />

Are0 51 .18 2.05 .21 -15<br />

1 2 3 J 5 b 7 6<br />

n 96 96 9 6 96 67 4 9 29 II<br />

r~lroI/Trom~ngwnoOo<br />

4.3 NO SPECIALIZATION, COURSE 3. The score pattern shown on the graph is<br />

what might be expected with the normal progress of a maturing population.<br />

(It must be remembered that the tapering off to the right of the graphs<br />

reflects decreasing population rather than a lower percentage of high<br />

scores. ) School use peaks in the second time period and tapers off<br />

rapidly. Thi: number of people getting “sixties” after the course is b&rely<br />

above the chance level of a.normal distribution indicating that the course<br />

a.nd test were not well aligned.’ The relatively high “60s” repetition rate<br />

(2.05) suggest a good relationship between the test and the actual Vorl:<br />

being performed.<br />

47


1 1 I I ,<br />

lob 5CP I b<br />

Area b2 .36 1 ,a7 .57 41<br />

I 2 3 J 5 6 1 0<br />

w 96 96 96 96 61 49 29 II<br />

Course 4<br />

4.4 NO SPECIALIZATIOLJ, CCXJFGE 4. Tast-3 and personnel &ta provide no indication<br />

of spXialization in platform positioning within the survs;y populacion.<br />

Th2 course appears LO be closely related to test conWit. Th2 very<br />

high percentage of persons achieving a 60 who do so following schonl (57% 1<br />

suggests a stronger relationship batween school and the area than is present<br />

for most other coursss . Discussions with senior petty officers suggest<br />

that this area may actually represent an area of time specialization<br />

with specialization occurring outside the study population.<br />

4 8


I ? 3 4<br />

5<br />

n 96 96 96 96 67 49 19 II<br />

~~tr~vrh~fv~ penoas<br />

COUf5d 5<br />

4.5 SPECIALIZATION BY ABILITY, COURSE 5. D!+zCialiZatiijlI iI1 this area iS<br />

suggested by th-+ high percentage of persons who get good scores following<br />

the<br />

schooling and the very high IIGOU repetition rate (2.30, hlghezt for<br />

thirteen ‘study courses). This repetition rate<br />

suggests continuing on the<br />

job reinforcement of school material, The cours$+ although very productive,<br />

is used on only a limited basis suggesting&is employed principally<br />

when a replacement is wanted for the current specialist.<br />

The indicators derived in this study to analyze th* r?lationshigs k~iec::?2:n<br />

training, _ j& _ knowledge testing and job performance provide analytical<br />

49


.<br />

nESCRLP’TOl?S OF JOR SPB ON JOF! K&2&&G-<br />

c<br />

conclusions which are consistent with survey responses. That is if you<br />

form an hypothesis from the indicators it can consistently be confirmed by<br />

survey . The indicator values used here were manually derived from lists of<br />

computer data but, once proven, lend themselves to computer analysis on a<br />

regular basis. The rectangular Venn diagram was adopted as a solution to<br />

the puzzling problem of how to easily show accurate overlap of circles of<br />

different sizes. It takes a little practice but they are easy to. use;<br />

Interpretation of the various indicators was facilitated by the divergence<br />

of patterns. Proposed interpretations could be confirmed by the changes of<br />

the indicators with different use patterns, One could say, “if the relat<br />

ionship changas in way X then the indicator will change in way Yll and<br />

confirm the hypothesis with another pattern from the study. This use of<br />

test and course usage data permits an objective, in depth analysis of the<br />

training relationships that can usually be achieved only by extensive data<br />

analysis and survey. While not eliminating all need for on site survey in<br />

training e*;raluation it supports less and better focussed survey time.<br />

50


ADDRESSING THE ISSUES OF “QUANTITATIVE OVERKILL" IN JOB ANALYSIS<br />

Julie Rheinstein<br />

Brian S. O'Leary<br />

Donald E. McCauley, Jr.<br />

U.S. Office of Personnel Management<br />

Washington, D.C.<br />

Paper presented at the 32nd Annual Conference of the <strong>Military</strong> <strong>Testing</strong><br />

<strong>Association</strong>, November 5-9, 1990, Orange Beach, Alabama.<br />

51<br />

. .


ADDRESSING THE ISSUES OF "QUANTITATIVE OVERKILL" IN JOB ANALYSIS<br />

Julie Rheinstein<br />

Brian S. O'Leary<br />

Donald E. McCauley, Jr.<br />

U.S. Office of Personnel Management<br />

Schmidt, Hunter and Pearlman (1981) have indicate3 that molecular job .<br />

analyses are unnecessary in selection research involving traditional aptitude<br />

tests. Fine-grained, detailed job analyses tend to create ttle appeararce of<br />

large differences in jobs, whereas, in fact, the differences are of no practical<br />

significance in selection' Our recent job analysis research has focussed 011<br />

looking at how job analysis projects can be less detailed and less cumbersome<br />

while still allowing one to obtain the necessary information for test<br />

development.<br />

O'Leary, Rheinstein and McCauley (1989, 1990) discussed several "holistic"<br />

job-analytic approaches used in fcrming job families. Their research suggests<br />

that the traditional fins-yrained,job-analytic approach may not 'always bt:<br />

necessary, especially when one is in a fast reaction situation.<br />

In the first phase of a project for tt;e development of an examination for<br />

Federal proiussional and administrative career occupations, job families were<br />

formed tising a procedure developed by Rosse, Borman, Campbell and Osburn (1985)<br />

(see O'Leary, Rheinstein, and McCauley, 1990,for a detailed explanation of the<br />

formation of job families). Once the families had been established it wiis<br />

necessary to determine the importance of various abilities for job performance,<br />

and which abilities to measure by a written test.<br />

The "inferential leap,- (i.e., the inferring of htiiitan abilities important<br />

for job performance) is traditionally performed by a panel of "subject matter'<br />

experts." However, there is little gtiidsnce in the literature concerning the<br />

composition of this panel of experts. As Ldndy (1988) has so ably iridicated,<br />

incumbents are the ones n,ost famiiiar with the job itself but are ofteli<br />

unfamiliar- with the conceptual or operational characteristics of tile abilities.<br />

On the other- hand, job analysts (often psychologists) are familiar with tr&<br />

characteristics of the abilities but are often not very familiar- with the job<br />

itself.<br />

The recent work of Butler and Harvey (1988) and Harvey (1939) showing that<br />

different kinds of tixparts (e.g., incumbents versus supervisor-s) provide<br />

different views of a job, arid often conflicting information, would seeni tc<br />

suggest that one might yet different results in job-ability linkage studies<br />

depending upon the composition of the panel of experts. We were able to address<br />

this isstie by comparing the job-ability linkage ratings made by personnel<br />

resewCtl psychologists to the scirne ratings made by job incumbents.<br />

When one conducts a traditional job analysis, the question becomes how much<br />

informAtion should be collected. Often raters are asked to rate tasks on several<br />

scales such as importance, time spent, difficulty, or physical demands.<br />

Weismuller, Staley and West's (1989) research indicates that ratings one scale<br />

al-e contaminated by ratings on other scales. Anecdoctal findings from job<br />

analysts indicates that obtaining ratings on importance, time spent, etc. i:;<br />

unnecessary in most cases because the ratings are highly correlated across<br />

scales.<br />

This paper will look at several aspects of job analysis and how the<br />

traditional, fine-grained methods may result in "quantitative overkill." We<br />

Will present data on sever-a? techniques for- determir?iny tha importmce of<br />

x+-...<br />

52<br />

---!!y


abilities for test development (Studies 1 and 21, as well as look at the relationship<br />

between relative importance and relative time spent ratings (Study 3).<br />

1STUDY<br />

Data Collection: Ninety four professional and administrative occupations in the<br />

civilian Federal work force were studied. A list of major duties was developed<br />

for each occupation. Then a list of abilities was developed by reviewing the<br />

construct literature (Northrop, 1989; French, Ekstrom and Price, 1363; and<br />

Peterson and Bowans, 1982). Abilities that could not be assessed through a .<br />

written test were not included. The resulting list contained seven abilities:<br />

verbal comprehension, general reasoning, number facility, logical reasoning,<br />

perceptual speed, spatial orientation, and visualization.<br />

Using the job-specific duty lists that were developed for each occupation,<br />

five research psychologists rated the seven abilities for their importance to<br />

SdCh overall job using a five-point scale. The scale ranged from "l-Unimportant"<br />

to "5-Crucial." It should be stressed that rather than rate each ability against<br />

each duty for every job, the psychologists were asked to read each duty list in<br />

its entirety and make a "holistic" judgment concerning the importance of aach<br />

of the abilities for the overall job. The psychologists made holistic ratings<br />

for tl-le occupations. Based on each psychologist's overall ratings for each job,<br />

averages for each ability were computed for each of the six job families.<br />

Approximately 6,000 job incumbents completed and returned the inventory.<br />

As part of this inventory, job incumbents were asked to rate each of the seven<br />

abilities using the sama five-point scale used by the psychologists. The<br />

inventory stressed that the incumbents rate each ability as it related to their<br />

overall job. That is, they were asked to make a "holistic" judgment about the<br />

importance of each ability. Based on each ir,cumbant's overall rating for the‘;r<br />

job, averages were computed for each job family for each ability.<br />

RESULTS<br />

The mean overall ability ratings of the psychologists and the job<br />

incumbents, for two of the largest job families, can be found in Table 1.<br />

Table 1. Comparisons of ability ratings for psychologists (N=5) and job<br />

incumbents by job family.<br />

Psychologists Incumbents<br />

mS.D.<br />

Business, Finance & Management Occupations<br />

Mean S.D.<br />

Verbal Comprehension 4.56 (.41) 4.40 (1.10) 2306<br />

General Reasoning 4.60 (.41) 4.13 (1.13) 2306<br />

Number Facility 3.99 C.71) 3.54 (1.27) 2306<br />

Logical Reasoning 4.38 (.48) 3.59 (1.29) 23c5<br />

Perceptual Speed 1.82 (1.12) 2.72 (1.33) 2306<br />

Spatial Orientation 1.68 c.62) 1.57 (1.29) 2306<br />

Visualization 1.55 c.38) 1.95 (1.46) 2306<br />

Personnel, Administration & Computer Occupations<br />

Verbal Comprehension 4.64 l.44) 4.51 (1.03) 1197<br />

Ganeral Reasoning 4.59 (.40) 4.25 (1.07) 1197<br />

Number Facility 3.18 l.56) 3.07 (1.31) 1197<br />

Loyical Reasoning 4.44 (.50) 3.59 (1.29) 1197<br />

Perceptual Speed 1.66 c.92) 2. 72 (1.33) 1197<br />

Spatial Orientation 1.34 (.49) 1.57 (1.29) 1197<br />

Visualization 1.26 (.46) 1.95 (1.46) 1197<br />

53


The average estimate of reliability (Cronbach's alpha) of the ratings across<br />

the six job families was .99 for the psychologists and .84 for the incumbents.<br />

There was very high agreement between the psychologists and the job<br />

incumbents in terms of the relative importance of the abilities to the jobs in<br />

each job family. The product-moment correlations among the mean ability ratings<br />

for the two groups of raters ranged from .96 to .98 and rank order correlations<br />

ranged from .89 to .96.<br />

To investigate whether or not the psychologists and the job incumbents<br />

agreed in terms of the absolute importance ratings given to the abilities, tests<br />

of the significance of the difference between the means for the two groups of<br />

raters were performed. In the majority of cases, the pairs of mearls were found<br />

to be significantly different. It should be borne in mind, however, that due<br />

to the large numbers in the incumbent group, even very small absolute differences<br />

will be statistically significant.<br />

When the mean ratings for both groups were dichotomized into those<br />

determined to be important (equal to, or greater than, 3.0--"Important" on the<br />

five-point scale) and those determined not to be important (less than 3.0 on the<br />

five-point scale), the two groups of raters were found to be in perfect<br />

agreement.<br />

STUDY 2<br />

Wernimont (1988) has indicated that governmental guidelines on employee<br />

selection still emphasize the necessity of focussing on job tasks and duties in<br />

job analysis, followed by documentation and justification for the inferences made<br />

about needed abilities. Perhaps this is a function of the fact that, as Schmidt<br />

(1988) points out, the empirical data upon which these governmental guidelines<br />

are based are inadequate in many areas, particularly job analysis.<br />

Our job-analytic research provided an opportunity to add to the empirical<br />

database by determining if job-ability linkage results obtdined in a "holistic"<br />

manner (i.e., by having job incumbents rate the importance of an ability to<br />

overall job success) were comparable to the job-ability linkages results obtained<br />

by requiring incumbents to rate the importance of abilities for each duty they<br />

perform. If the results obtained from the two methods were found to be similar,<br />

significant reductions in the cost, as well as the intrusiveness, of the job<br />

analysis process for test development could be possible.<br />

Data Collection: As indicated earlier, in the job analysis inventory the<br />

incumbents were asked to rate each of the seven abilities, using a five-point<br />

rating scale, for their importance to overall job performance (i.e., the holistic<br />

approach). Average ability importance ratings were then computed for each job<br />

family.<br />

After rating the importance of the ability to the overall job, these sama<br />

incumbents were asked to rate the importance of each of the seven abilities to<br />

the performance of each individual job duty they had previously indicated they<br />

pet-formed (i.e., the traditional fine-grained, duty-ability linkage approach).<br />

Using this traditional approach, a mean ability rating was determined by summing<br />

each incumbent's ability rating for each duty performed and then dividing by the<br />

number of duties that the incumbent performed. Averages were then computed fcr<br />

each job family for each ability.<br />

RESULTS<br />

The average ratings for each ability across duties performed, for the same<br />

job families, are presented in Table 2. For ease of comparison, the mean overall


atings for the incumbents given in Table 1 are repeated in Table 2.<br />

Table 2. Comparison of incumbents' mean ability ratings across joo specific<br />

duties and mean overall ability ratings, by job family.<br />

Job Specific Holistic<br />

Mean S.D. 1<br />

Business, Finance & Management Occupations<br />

Verbal Comprehension 4.01 (.77) 2288<br />

General Reasoning 3.99 (.65) 2287<br />

Number Facility 3.22 (.85) 2250<br />

Logical Reasoning ,I 3.59 (.80) 2246<br />

Perceptual Speed 2.57 (.91) 2101<br />

Spatial Orientation 1.96 (.96) 1737<br />

Visualization 2.38 (1.08) 1897<br />

Personnel, Administration & Computer Occupations<br />

Verbal Comprehension 4.12 (.75) 1197<br />

General Reasoning 4.11 (.62) 1194<br />

Number Facility 2.66 (.94) 1155<br />

Logical Reasoning 3.74 (.77) 1174<br />

Perceptual Speed 2.20 t.96) 1048<br />

Spatial Orientation 1.73 (.88) 899<br />

Visualization 2.08 (1.05) 938<br />

S.D. Mean a<br />

4.40 (1.10) 2306<br />

4.13 (1.13) 2306<br />

3.54 (1.27) 2306<br />

3.59 (1.29) 2305<br />

2.72 (1.33) 2306<br />

1.57 (1.29) 2306<br />

1.95 (1.46) 2306<br />

4.51 (1.03) 1197<br />

4.25 (1.07) 1197<br />

3.07 (1.31) 1197<br />

3.59 (1.29) 1197<br />

2.72 (1.33) 1197<br />

1.57 (1.29) 1197<br />

1.95 (1.46) 1197<br />

The average estimate of reliability (Cronbach's alpha) of the ratings across<br />

the six job families was .80 for the incumbents' ratings across duties and .84<br />

for the incumbents' holistic ratings.<br />

There was very high agreement between the incumbents holistic ratings and<br />

the average job-specific duty ratings in terms of relative importance. The<br />

product-moment correlations among the mean ability ratings from the two types<br />

of ratings ranged from .98 to .99 and rank order correlations ranged from .85<br />

to 1.00.<br />

In terms of the absolute importance ratings given to the abilities, tests<br />

of the significance of the difference between the means for -ti-,e two types of<br />

ratings revealed tilat, in all but one case, the pairs of means were statistically<br />

different.<br />

When the mean ratings for both types of ratings were dichotomized into those<br />

determined to be important (again, equal to or greater than 3.0--"Important" on<br />

the five-point scale) and those determined not to be important (less than 3.0<br />

on the five-point scale), the two types of ratings were found to be in agreement<br />

in all but three instances.<br />

Study 3<br />

Data' Collection: As part of the five-section inventory, incumbents were asked<br />

to rate 57 generalized<br />

work behaviors (GWB'S) developed specifically for the 113<br />

professional and administrative occupations (See O'Leary, Rheinstein, and<br />

McCauley, 1990 for a detailed discussion of the development of the GWB's). The<br />

GWB's were rated for relative importance and relative time spent. Incumbents<br />

were first asked to check the GWB's they perform. Then, they rated the ones they<br />

checked using a 5-point relative importance scale ranging from "1 - Unimportant"<br />

to "5 - Crucial" and a 5-point relative time spent scale ranging from "1 - Very<br />

much below average time" to "5 - Very much above average time." Each of the<br />

55


atings for the 57 GWB's were correlated across occupations yielding 57<br />

correlations.<br />

RESULTS<br />

Correlations between the rating on the two scales ranged from .77 to .93<br />

for the 57 GWB's, with a mean c of .89, indicating a strong relationship between<br />

relative importance and relative time spent ratings.<br />

DISCUSSION<br />

Sackett, Cornelius, and Carron (1981), Cornelius, Schmidt, and'carron<br />

(1984), and othars have shown in a classification setting that holistic judgments<br />

compared favorably with those made on the basis of large-scale job analyses.<br />

Study 2, described above, showed similar results in that the relative<br />

importance of abilities as measured through linkages with job-specific duties<br />

was nearly identical to that obtained from linkages with the job as a whole.<br />

The results obtained in Study 1 suggest that similar ratings of the importance<br />

of abilities to job performance can be obtained from holistic ratings made by<br />

two types of raters--psychologists and job incumbents. The deterrllination of<br />

which abilities were important to job performance was identical for the two<br />

groups of raters.<br />

At first glance, these findings would appear to make a strong case for<br />

saying there is overkill in the job analysis process and that it is possible to<br />

streamline job analysis procedures for test development. In situations where<br />

one needs results in a hurry, holistic methods can be used. In addition, in this<br />

it was found that it is not necessary to have incumbents rate both importance<br />

and time spent, unless occupations such as police officer or fireman are being<br />

studied. It is well known that it is important for police officers to be able<br />

to use a gun properly, even though they may not spend a lot of time doing it.<br />

However, the equivalence of the results obtained from the three sources and,<br />

thus, the interchangeability of the sources, ultimately depends upon the use to<br />

which the information will be put. As was mentioned above, if job anhlysts want<br />

to determine which abilities are important for job performance, the three sources<br />

of data produce virtually equivalent results. If other types of decisions are<br />

to be made (e.g., weighting the parts of an ability test battery to achieve a<br />

composite score), the absolute differences among the mean ratings could produce<br />

different results. While one could not claim that the results obtained by the<br />

three different methods were equivalent in the terms outlined by Gulliksen<br />

(1968), it would seem that thay could be used inter-changeably in some<br />

circumstances.<br />

REFERENCES<br />

Butler, S.K. and Harvey, R.J. (1988). A comparison of holistic versus decomposed<br />

rating of Position Analysis Questionnaire work dimensions. Personnel Psvcholoay,<br />

41, 761-771.<br />

Cornelius, E.T., Schmidt, F.L., and Carron, T.J. (1984). Job classification<br />

approaches and the implementation of validity generalization results. Personnel<br />

Psvcholosv, 37, 247-260.<br />

French, J.W., Ekstrom, R.B., and Price, L.A. (1963). Kit of reference tests for<br />

coqnitive factors. Princeton, N.J.: Educational <strong>Testing</strong> Service.<br />

Gulliksen, H. (196a). Methods for detarmining equivalence of measures.<br />

56


Psycholoqical Bulletin, 70(61, 534-544.<br />

Harvey, R.J. (1989). Incumbent versus supervisor ratings of task inventories:<br />

Overrating, underrating, contamination, and deficiency. In press.<br />

Landy, F.J. (1988). Selection procedure development and usage. In S. Gael<br />

(Ed.), The iob analysis handbook for business, industrv and qovernment. New<br />

York: John Wiley and Sons, Inc.<br />

Northrop, L.C. (1989). The psychometric history of selected ability constructs.<br />

U.S. Office of Personnel Management.<br />

O'Leary, B.S., Rheinstein, J., and McCauley, D.E. (1990). Developing job families<br />

using generalized work behaviors. Proceedings of Annual MTA Conference, Orange<br />

Beach, AL.<br />

Peterson, N.G. and Bowans, D.A. (1982). Skill, task, structure, and performance<br />

acquisition. In Dunnette, M.D. and Fleishman, E.A. (Eds.), Human performance<br />

and oroductivitv: Human caoability assessment. Hillsdale, N.J. Lawrence Erlbaum<br />

Associates.<br />

Rosse, R.L., Barman, W.C., Campbell, C.H., and Osbur-n, W.C. (1985). Grouping<br />

Army occupational specialties by judged similarity. Unpublished paper, 1984.<br />

Sackett, P.R., Cornelius, E.T., and Carron, T.J. (1981). A comparison of global<br />

judgment versus task-oriented approaches to job classification. Personnel<br />

Psvcholosy, 34, 791-804.<br />

Schmidt, F.L., Hunter, J.E., and Pearlman, K. (1981). Task differences as<br />

moderators of aptitude test validity in selection: A red herring. Journal of<br />

Apolied Psvcholoqv, 66, 166-185.<br />

Schmitt, N. (1987). Principles III: Research issues. Paper presented at the<br />

second annual conference of the Society for Industrial and Organizational<br />

Psychology.<br />

U.S. Equal Employment Opportunity Commission, U.S. Civil Service Commission, U.S.<br />

Department of Labor, and U.S. Department of Justice. (1978). Uniform quidelines<br />

on emolovee selection procedures. Federal Register, 43(166), 38290-38309.<br />

Weismuller', J.J., Staley, M.R. & West, S. (1989). CODAP: A comparison of single<br />

versus multi-factor task inventories. Proceedings of Annual <strong>Military</strong> <strong>Testing</strong><br />

<strong>Association</strong> Conference, San Antonio, TX.<br />

Werni'mont, P.F. (1988). Recruitment, selection and placement. In S. Gael (Ed.),<br />

The iob analysis handbook for business, industry, and oovernment. New York:<br />

John Wiley and Sons, Inc.<br />

57<br />

/<br />

--~


Introduction<br />

Developing Job Families Using Generalized Work Behaviors<br />

Brian S. O'Leary<br />

Julie Rheinstein<br />

Donald E. McCauley, Jr.<br />

U.S. Office of Personnel Management<br />

This paper describes one phase of a large-scale research project aimed at<br />

developing and refining a list of work behaviors common to approximately 100<br />

different Federal professi&nal and administrative occupations. In this phase<br />

of our research, we were attempting to form job families and to describe how<br />

these job families differ in terms of the relative time spent on yeneral work<br />

behaviors.<br />

Traditional systems for describing jobs have usually focussed on dascribing a<br />

single job rather than attempting to determine the similarity among jobs.<br />

Thus, many of the traditional means of describing jobs, such as task analysis,<br />

are somewhat limited when one tries to compare across jobs.<br />

Since one of the ultimate uses for our research was the development and<br />

documentation of selection tests we needed a method of comparing jobs using<br />

some form of work behavior as a unit of measurement. Our goal was to develop<br />

a method of comparing jobs in such a way as to be consistent witt, provisions<br />

of the Uniform Guidelines. The Guidelines define "work behavior" in the<br />

following manner: "an activity performed to achieve the objectives of the<br />

job. Work behaviors involve observable (physical) components and unobservable<br />

(mental) components. A work behavior consists of the performance of one or<br />

more tasks. Knowledges, skills, and abilities are not behaviors although they<br />

may be applied in work behaviors" (Section 16, 43FR38308).<br />

Develooment of the Generalized<br />

Work Behaviors (GWB's)<br />

First it was proposed that a li st of occupation-specific duties be<br />

constructed. A list of general izable work behaviors could then be generated<br />

by grouping the occupationally specific duties in terms of common underlying<br />

work behaviors.<br />

We extended the work of Outerbridge (1987) in the development of our GWB's.<br />

Outerbridge had developed a list of 32 GWB's. She used duty statements<br />

contained in the occupational definitions in the Dictionary of Occupational<br />

Titles (DOT) for 24 populous Federal professional and administrative<br />

occupations.<br />

A list of 223 duty statements were extracted from the DOT. Each one was<br />

placed on a separate card. These duty statements were then sorted into<br />

categories describing similar work behaviors, first by a group of 10 perSOrlnel<br />

psychologists and later by a group of 10 occupational specialists. Nineteen<br />

sorters provided usable data. The 19 separate sorts were summarized and<br />

compared after transformation into matrix form. Matrix representation allowed<br />

the development of final work behavior categories using cluster analySiS to<br />

discovel- the structure within the summarized data and also allowed the


quantitative comparison of the sorter categorizations. A list of 32 defined<br />

work behaviors was developed. Final categories were named and definitions<br />

were added tl, suggest the commonality among duty statements making up each<br />

category.<br />

In the present study we began by reviewing OPM's Classification and<br />

Qualification Standards for each of 113 professional and administrative<br />

occupations. For- each occupation, the major job specific duty stataments were<br />

extracted from the Standards. Approximately 10 to 15 major duty statements<br />

were obtained for each occupation. In total, over 1,400 job-specific duty<br />

statements were developed.<br />

Using the 32 GWB's developed by Outerbridge, we had four psychologists sort<br />

each of the 13OOt job specific duties into the 32 GWB';, if applicable.<br />

Sorters were instructed to sort the duties on the basis of war-k behaviors.<br />

Job-specific duties that could not be sorted into the 32 GM's were placed<br />

into a miscellaneous category. Sorters were advised to put job-specific<br />

duties into the miscellaneous category if they had reservations about placing<br />

them in any one of the generalized work behavior categories. Sorters were<br />

also instructed to develop new generalized work behavior categories if they<br />

found that several job-specific duties did not fit into any of the GkB's<br />

categories but seemed to describe a common underlying work behavior.<br />

For the group of sorters, the average time required fcr the sorting tash was<br />

approximately 8 hrs. Sorters generally broke up the task into 2 half-day<br />

segments. If three out of the four sorters classified a specific job duty into<br />

a GWB category, we considered it to be a match. Using this criterion,, about<br />

75% of the 1400t job specific duties were able to be classified into the 32<br />

GWB's.<br />

The original sorters also developed 18 additional GWE categories. using the<br />

332 job specific duties that could not be sorted into the original 32 GWB's,<br />

another group of 4 psychologists sorted these job specific duties into ttle 18<br />

additiorlal GWB's and were also told to develop new GWB's if necessary and<br />

appropriate.<br />

In total, 25 additional GWB's were developed. Using the same 75% agreement<br />

criterion, 290 more job-specific duties wet-e classified into & generalized<br />

work behavior category. Out of a total of 1,438 job-specific duties, only 42,<br />

or about 3% could not be classified into d generalized work behavior. T&ble 1<br />

shows two examples out of the total 57 GM's developed.<br />

Table 1 Examples of Generalized Work Behaviors<br />

1. Presents information about work of the organization to others: e.g.,<br />

Describes agency programs and servictis to individuals or groups in community<br />

or to higher management.<br />

2. Applies regulations to organizational programs and activities: e.g.,<br />

Selects and interprets laws to ensure uniform application on wage and hour or<br />

safety and occupational health issues and in the sale arid leasing of property.<br />

59<br />

.


Ratinq the Generalized Work Behavior-s<br />

These 57 GWB's were included in a five-section inventory that was sent to<br />

about 14,000 incumbents in 113 occupations. Approximately 7,000 inventories<br />

were completed and returned. Of the 7,000 inventories that were received<br />

about 6,000 were from the 94 occupations under study herein. As part of the<br />

inventory incumbents were first asked to read all the GWB's and then check the<br />

ones they perform.<br />

One of the first questions we investigated was what types of GW8's are<br />

performed most often across jobs. Table 2 presents the six G'h'B's that are<br />

performed the most as well 2s the six that are performed the least.<br />

.<br />

Table 2 Most Frequently and Least Frequently Performed Generalized Work<br />

Behaviors<br />

Most frequently performed<br />

Writes correspondence, memoranda, manuals, technical reports, or reports<br />

of activities and findings.<br />

Interviews or confers with persons to obtain information not otherwise<br />

conver,iently available or gathers facts on specific issues from<br />

knowledgeable persons: e.g., Interviews persons, visits establishments,<br />

or confers with technical or professional specialists to obtain<br />

information or clarify facts.<br />

Analyzes and interprets information and makes recommendations based on<br />

findings: the information can be numerical or presented in verbal or<br />

pictorial form.<br />

Responds to inquirlas from the pzdblic, other agencies, Congress, etc.,<br />

concerning the work of the activity.<br />

Keeps records and compiles statistical reports.<br />

Reviews documents for conformance to standard procedures verifying<br />

correctness and completeness of data and authenticity of documents:<br />

e.g., May audit financial data.<br />

Least frequently performed<br />

Performs policing functions such as arresting and detaining persons and<br />

seizing contraband.<br />

Writes, tests, and documents computer programs.<br />

Sells property or at-ranges for disposal of property, supplies or<br />

records: e.g., Inventories, advertises and sells d delinquent taxpdyer's<br />

seized property or disposes of archival records.<br />

Plans and directs organization's public relations function.<br />

Inspects persons, baggage, or other materials. Inspection involves St<br />

least some physical action by the inspector.<br />

Drafts regulations based on an analysis of information: e.g., Drafts<br />

regulations on transportation systems or employment and training<br />

legislation.<br />

AS can be seen in Table 2, writing, interviewing, record-keepirig, ensuring<br />

ComPliaftCe of rsgulations and providing information tu the public are tne<br />

GWB'S that are most frequently performad across these professional and<br />

60


administrative occupations. The least frequently performed GW3's are those<br />

that are more specific to a particular occupation such as police w


I General management and supervisory functions<br />

II Evaluating programs and ensuring compliance with regulations<br />

III Dissemination of information<br />

IV Gathering, classifying, and organizing information<br />

V Budgeting and accounting functions<br />

VI Application of rules and regulations - making determinations<br />

VII Planning and developing policy and procedures<br />

VIII Computer utilization<br />

IX Police functions<br />

X Investigating and arbitrating<br />

XI Inter-viewing<br />

The next question addressed was "how do the job clusters differ on the 11<br />

factors of GWB's?" A dimension score was calculated for each factor by<br />

summing the item scores which loaded on that factor. These scores were then<br />

standardized. A mean profile on the 11 factors was computed for each of the<br />

six job clusters formed from the Q-factor analysis of the GWB's. TabJe 3<br />

lists for each job cluster the GWB factors that were rated above the mean for<br />

relative time spent.<br />

Table 3 Important generalized work behavior factors by occupational cluster<br />

I.<br />

II.<br />

III.<br />

IV.<br />

V.<br />

VI.<br />

General Business and Administration<br />

A. General management and supervisory functions<br />

8. Evaluating programs and ensuring compliance with regulaticns<br />

C. Gathering, classifying and organizing information<br />

D. Budgeting and accounting functions<br />

E. Planning and developing policy and procedures<br />

F. Computer utilization<br />

Claims Examining Occupations<br />

A. Applications of rules and regulations - making determinations<br />

B. Investigating and arbitrating<br />

Law Enforcement Occupations<br />

A. Police functions<br />

B. Investigating and arbitrating<br />

C. Interviewing<br />

0. Application of rules and regulations - making determinations<br />

Public Information Occupations<br />

A. Dissemination of information<br />

B. Interviewing<br />

Industrial/Labor Relations<br />

A. Investigating and arbitrating<br />

B. Interviewing<br />

C. Evaluating programs and ensuring compliance with regulations<br />

Specialized Program Analysis<br />

A. Gathering, classifying and organizing informAtion<br />

62


SUMMARY<br />

This exploratory study was one of the first applications of the GWB's.<br />

Certainly, the GWB's need refinement but, at this stage of development,, the<br />

results look promising. The results obtained in this study make sense<br />

intuitively in terrns of the GWB's performed the most and least across jobs,<br />

job dimensions, and clusters of related jobs.<br />

REFERENCES<br />

Ford, J.K., MacCallum, R.C. & Tait, M. (1986). The application of exploratory<br />

factor analysis in applied. psychology: A critical review and analysis.<br />

Personnel Psvcholoqv, 39, 291-314.<br />

Leaman, J. and Steinberg, A.G. (1990). Factor analysis versus CODAP<br />

hierarchical clustering for a leadership task analysis. Paper presented at<br />

the 98th Annual American Psychological <strong>Association</strong> Conference, Boston, MA.<br />

Outerbridge, A.N. (1981). The development of seneralizable work behavior<br />

categories for a synthetic validity model. Washington, O.C.: U.S. Office of<br />

Personnel Management. Personnel Research and Development Center.<br />

SAS Institute, Inc. (1985). SAS user's quide: Statistics (Version 51. Car-y,<br />

NC: SAS Institute, Inc.<br />

U.S. Equal Employment Opportunity Commission, U.S. Civil Service Commission,<br />

U.S. Department of Labor, 8, U.S. Department of Justice. (1978). Uniform<br />

guidelines on emolovee selection orocedures. Federal Register, 43(166),<br />

38290-38303.<br />

63


A COMPARISON OF HOLISTIC AND TRADITIONAL<br />

JOB-ANALYTIC METHODS<br />

Brian S. O'Leary, Julie Rheinstein, and Donald E. McCauley, Jr.<br />

U.S. Office of Personnel Management<br />

Washin+on, D.C.<br />

INTRODUCTION<br />

Job analysis is the foundation of many personnel systems including . .<br />

selection, performance appraisal, and training. Most often,<br />

lengthy inventories are developed and administered to job<br />

incumbents. This process can be very time-consuming and costintensive.<br />

Several researchers have looked at methods of reducing the time and<br />

the cost of job analysis. Grouping jobs on the basis of work<br />

behaviors provides one way of reducing the cost of examination<br />

development while not sacrificing test validity. Barnes and<br />

O'Neill (1978) grouped jobs for examination development in the<br />

Canadian Public Service. Rosse, Borman, Campbell, and Osborn<br />

(1984) clustered U.S. Army enlisted jobs into homogeneous groups<br />

according to rated job content in order to choose a representative<br />

sample of MOS's for test validation purposes. Rosse et al.<br />

clustered the jobs by sorting them on the basis of holistic job<br />

descriptions.<br />

Using a methodology similar to that used by Rosse et al.,<br />

Rheinstein, McCauley, and O'Leary (1989) compared sources of job<br />

information (i.e., the people doing the sorts). McCauley, O'Leary,<br />

and Rheinstein (1989) compared the job groupings that resulted when<br />

the sorters received varying amounts of job information. These<br />

studies provided some of the data to be presented below.<br />

The purpose of the present study was to compare a traditional<br />

method of job analysis (administering an inventory to a large<br />

sample of job incumbents) to the more holistic methods described<br />

above.<br />

Data Collection for the Holistic Methods<br />

j<br />

METHOD /<br />

A) Eighty-seven professional and administrative occupations in the<br />

Federal civilian work force were studied. Personnel research<br />

professionals and staffing specialists grouped the occupations into<br />

categories according to similarity of work behaviors. These raters<br />

were given descriptions of the 87 jobs which were taken from the<br />

Federal Government's Handbook of occupational Groups and Series of<br />

Classes (1969). The job descriptions consisted of the job titlr:<br />

and a brief narrative which summarized the major duties of the job.<br />

64<br />

I


---<br />

These job descriptions were printed on 5 x 9 cards and given to the<br />

raters for sorting. The General Schedule (GS) series numbers were<br />

not included. Raters were asked to sort the jobs according to<br />

similarities in work behaviors. No limitations were put on the<br />

number of categories each rater could generate.<br />

Two groups completed the sort: (1) nine members from the Office of<br />

Personnel Research and Development (OPRD) at the U.S. Office of<br />

Personnel Management consisting of eight personnel research<br />

psychologists and a personnel staffing specialist .(the _.<br />

ltpsychologistsll) and (2) seven personnel staffing specialists from<br />

seven different federal agencies (the "staffing specialistst').<br />

B) A second group of &affing specialists sorted just the job<br />

titles. The GS series numbers were not included. These raters<br />

also were asked to sort the jobs according to what they perceived<br />

to be similarities in work behaviors based on the job titles. No<br />

limitations were put on the number of categories the rater could<br />

generate.<br />

The categories resulting from each of the sorts were transformed<br />

into a 87 by 87 matrix for each rater wherein a one in a cell<br />

indicated that those two jobs were placed in the same category by<br />

the rater and a zero in a cell indicated that the two jobs were not<br />

placed together. The matrices thus derived were added together<br />

producing three summary matrices - one for the psychologists, one<br />

for the staffing specialists using job descriptions, and one for<br />

the staffing specialists using job titles only. These matrices<br />

were then factor analyzed. The six-factor solutions accounted for<br />

68.1% of the variance for the psychologists, 70.4% for the staffing<br />

specialists using the job descriptions, and 68.3% for the staffing<br />

specialists using job titles only. The overall agreement between<br />

the psychologists and the staffing specialists using the job<br />

descriptions was 60%. There was an agreement of 56.6% in the<br />

classification of the jobs between the staffing specialists using<br />

the job descriptions and the staffing specialists using job titles<br />

only.<br />

'Data Collection for the Job Inventory Method<br />

A five-section inventory that included a section of generalized<br />

work behaviors (GWB's) developed specifically for the professional<br />

and administrative occupations under study was administered to job<br />

incumbents. Approximately 14,000 inventories were sent out to<br />

incumbents, and approximately 6,000 inventories were completed and<br />

returned. As part of the inventory incumbents were first asked to<br />

read all the GWB's and then check the ones they perform.<br />

Incumbents were then asked to rate the GWB's in terms of relative<br />

time spent using a five-point scale ranging from "1 -Very much<br />

below average time" to "5 - Very much above average time." Mean<br />

time spent ratings were calculated for each GWB for each job.<br />

65


These means were factor analyzed to produce job groupings. The<br />

six-factor solution accounted for 77.4% of the variance.<br />

Experimental Desiqn<br />

The results of the factor analyses derived from each of the<br />

holistic methods were compared to the results derived from the job<br />

inventory method. For this study, the job inventory method was<br />

considered to be the criterion and the holistic methods were<br />

considered predictors. An agreement was defined as occurring when _<br />

a predictor agreed with the criterion concerning the placement of<br />

a job in a group. The percentage of agreement is the total number<br />

of agreements divided by, the total number of jobs.<br />

RESULTS<br />

The number of jobs in each grouping for the three holistic methods<br />

and for the job inventory method is shown below in Table 1.<br />

Table 1<br />

Number of Jobs in Each Groupinq for Each Method<br />

Method<br />

Grouping<br />

1 2 3 4 5 6<br />

Job Inventory 3 45 8 10 17 4<br />

Psychologist 17 24 7 10 16 13<br />

(Job Description)<br />

Staffing Specialist 4 34 7 9 14 19<br />

(Job Description)<br />

Staffing Specialist<br />

(Job Titles)<br />

2 19 12 10 30 14<br />

As one can see from this table, the number of jobs per grouping was<br />

relatively stable across all four methods in Groupings 3 and 4.<br />

Groupings 1, 2, and 5 produced relatively good agreement in terms<br />

of the number of jobs to be included between the job inventory<br />

method and two of the three holistic methods. In Grouping 6, there<br />

was relatively good agreement across the three holistic methods but<br />

not with the job inventory method.<br />

66


Table 2 below illustrates the degree of agreement between each of<br />

the holistic methods and the job inventory method. In this table,<br />

the number of job correctly assigned to each grouping are presented<br />

for each holistic method. The percentage of agreement is also<br />

presented for each holistic method.<br />

Table 2<br />

Number of Jobs Correctly Assiqned for Each Holistic Method .<br />

Methods<br />

Psychologist<br />

(Job Description)<br />

Staffing<br />

Specialist<br />

(Job Description)<br />

Staffing<br />

Specialist<br />

(Job Titles)<br />

Grouping<br />

1 2 3 4 5 6 Total<br />

1 20 5 9 12 2<br />

0 25 5 8 10 2 50 57.5%<br />

0 18 0 8 13 3 42 48.3%<br />

49<br />

Percentage<br />

of Asreement<br />

56.3%<br />

The agreement with the job inventory method was relatively similar<br />

for the two groups working with the short job descriptions and<br />

somewhat lower for the group working only with job titles.<br />

When the factor loadings derived from the job inventory method were<br />

examined more closely, it was found that for 19 jobs the difference<br />

between the primary loading and the secondary loading was less than<br />

0.1. This finding indicates that, in terms of generalized work<br />

behaviors, these jobs could be classified equally well in either of<br />

two groupings. It was decided that the definition of agreement<br />

could reasonably be expanded to include agreement with either the<br />

primary or the secondary grouping for these 19 jobs. Under this<br />

revised definition, the percentages of agreement raise to 64.4% for<br />

the staffing specialists working with job descriptions, 63.2% for<br />

the psychologists, and 54% for the staffing specialist working only<br />

with job titles.<br />

Similar results were obtained when the percentages of agreement<br />

were'calculated using only the 68 jobs for which there were unique<br />

67


factor loadings (i.e., where the difference between the primary and<br />

secondary loadings was greater than 0.1). Using these 68 jobs, the<br />

percentages of agreement were 64.7% for the staffing specialists<br />

using job descriptions, 66.2% for the psychologists, and 55.9% for<br />

the staffing specialists using job titles only.<br />

DISCUSSION<br />

The findings of this study are somewhat hard to. interpret.<br />

Agreements of 56 % to 66% are too high to conclude that the holistic<br />

methods have no merit for the purpose of grouping jobs but not high<br />

enough to advocate their replacing traditional job inventory<br />

procedures. The cause of this inability to make a clear<br />

determination may well be the criterion measure itself (i.e., a job<br />

inventory based on work behaviors) since there was extremely high<br />

agreement between holistic and traditional approaches when the jobs<br />

were viewed in terms of ability requirements rather than work<br />

behaviors (Rheinstein, O'Leary, and McCauley, 1990).<br />

There are two factors that should be examined as causing this lack<br />

'of clarity in the criterion. The first is the nature of the jobs<br />

under study. Agreement was consistently higher across all four<br />

methods for some groupings (Groupings 4 and 5) than for others.<br />

The jobs within Group 4 were primarily enforcement jobs, and those<br />

in Group 5 were primarily jobs dealing with claims examining. The<br />

jobs in the other groups were more general in nature. The fact<br />

that there was no clear factor loading for 19 jobs (21.8%) means<br />

that there was much overlap of work behaviors among the jobs and<br />

that they could be equally well grouped in more than one way.<br />

The second factor to consider is the use of generalized work<br />

behaviors. It may be that the 57 GWBls used in this study were not<br />

sufficient to distinguish clearly among the 87 jobs. This<br />

hypothesis is supported by the fact that when the job-specific<br />

duties were grouped to develop the GWB's, there were 42 duties (or<br />

3% of the total number of duties) which could not be classified<br />

into one of the 57 GWB's (O'Leary, Rheinstein, and McCauley, 1990).<br />

The development and use of additional GWB's could add other<br />

dimensions upon which groupings would differ more distinctly,<br />

thereby facilitating the assignment of jobs.<br />

Despite the shortcomings mentioned above, the use of elements such<br />

as the GWB shows promise for grouping jobs on the basis of work<br />

behaviors. An inventory that consisted of truly job-specific<br />

duties (or tasks) would not only be unwieldy but would also not<br />

Permit grouping of jobs because there would be little or no overlap<br />

of work behaviors across jobs.<br />

Until further advances are made in this area, the question of the<br />

efficacy of holistic methods of job grouping remains unsolved.<br />

. .


However, the degree of agreement obtained in this study argues for<br />

pursuing research in this area.<br />

REFERENCES<br />

Barnes, M. & O'Neill, B. (1978). Empirical analysis of selection<br />

test needs for 10 occupational groups in the Canadian Public<br />

Service. Paper presented to the meeting of the Canadian<br />

Psychological <strong>Association</strong>, Ottawa, June, 1978.<br />

McCauley, D.E., O'Leary, B.S., & Rheinstein, J. (1989)'. A -'<br />

comparison of two holistic rating methods for grouping<br />

occupations. Presentation at the Conference of the <strong>Military</strong><br />

<strong>Testing</strong> <strong>Association</strong>, San Antonio, TX.<br />

O'Leary, B.S., Rheinstein, J. & McCauley, D.E. (1990). Developing<br />

job families using generalized work behaviors. Presentation at<br />

the Conference of the <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong>, Orange<br />

Beach, AL.<br />

Rheinstein, J., O'Leary, B.S., & McCauley, D.E. (1990). Addressing<br />

the issue of "quantitative overkill" in job analysis.<br />

Presentation at the Conference of the <strong>Military</strong> <strong>Testing</strong><br />

<strong>Association</strong>, Orange Beach, AL.<br />

Rheinstein, J., McCauley, D.E., & O'Leary, B.S. (1989). Grouping<br />

jobs for test development and validation. Presentation at the<br />

Conference of the <strong>International</strong> Personnel Management<br />

<strong>Association</strong> Assessment Council, Orlando, FL.<br />

Rosse, R.L., Borman, W.C., Campbell, C.H., & Osborn, W.C. (1984).<br />

Grouping Army occupational specialties by judged similarity.<br />

Unpublished paper.<br />

U.S. Civil Service Commission. (1964). Handbook of occupational<br />

groups and series of classes. Washington, DC: U.S. Civil<br />

Service Commission.<br />

69


DeI.ayneR.Hudqeth<br />

PaulR. Fayfich<br />

TheUniversityof TWas at-in<br />

Johns. Price, SQNIDR,RAAF<br />

AirForceHumanResoums Laboratory<br />

misreseamhwasccnztuctedattheAirForceHuman Resaurces Laboratory<br />

(AFHRL) under the 1990 !3tnmsr ReseamAProgramforfacultytigraduate<br />

students, sponsored by the Air Force Office of Scientific Reseamh.<br />

Introduction<br />

l%eU!W?Occupational Me&mmentSquadron(CB&Q),RaMofphAirFo~l3?se<br />

(ZGB), Texas is responsible for the preparation, admi.nistramnon~~Ys~<br />

of USAF cccupational surveys. Using current pmxedums,<br />

initial sumeymail-outand initial data processingrangesfmnseventonine<br />

months. Currentmethe forcmllectingand processingdataforoccupational<br />

analysis studies are slew, ccmplicated, arAexpensive. C@lSQhas?D4u~M<br />

MHRL to investigate a mre efficient system of administering occupational<br />

. Of particular interest is thepossibilityofautcmatingtheprocess<br />

~pemonal cmputersandtheuseofthe Defense DataNetwork for<br />

distribxtion of weys and collection of responses.<br />

Objectives of the Reseaxh Effort<br />

Five objectives for the effort were determined: (1) to create a<br />

computeriz@dversionoftheChapel~g~tJ&surVey (chosenbecause it<br />

was &out to be administered via traditional mans) t (2) to prepare d<br />

executearesearch design for cmparing the two form of administration: (3)<br />

towllectdata; (4) toanalyzethedataanddescribetheres~lts;ard (5) to<br />

prcvide recommendations for further reseaxhanddevel~t.<br />

Thecxxqzuter~jobinvento~ foruse inthesurveywasdevelopedusing<br />

Microsoft~s QuickBasic version 4.5. ThisthirdgenerationlanguageallcxJedUs<br />

towrite, testandplace scftware inthe field inlessthan fmrweeks.<br />

Modular develmt and formtive evaluation were used throughout this<br />

Pro==.<br />

Thesoftwareconsistsoftwo ir&pendentmAulesthatare chained<br />

together. The firstn&uleccnta~~eBiographicdl and Backgrounasections<br />

of the survey, and has 13 subpmcedums and 2576 lines of code. Ihe second<br />

mduleisthemty-TaskSectionandhas18 WW al-d 1341lines of<br />

code. Itcontainstwomajorprmedmes: thefirsthasthejobinclnnbent<br />

reviewingeachof407 tasksandidentifyingthetasksperformedinhisorher<br />

job; thesecondprocedure has the job immibent rating relative time spent,<br />

with a<br />

nine-point scale, onlythosetasksidentified inthe firstpe.<br />

Inbothprmedmesthe incmbentsmd 'backup' tirevieworchangeansw~<br />

and ratings.


Tosunmmrize, informationgeneratedfrranthesecondmodule includedthe<br />

identificationoftasksperfoxmdby incmbents intheirpresentjobandtime<br />

ratingdata foreachtask. !Iheprqzmalsocollectsdataontheammntof<br />

t~~inus~thenrodule,hcrwroaniytimesan~tbacked~and<br />

changeda-, andhcwmanyerror messageswerewrittmtothescrem.<br />

Alldatawerewrittenouttodata filesamlcapbredon floppy disk.<br />

F&sear&Design<br />

Ihetwotieper&ntvariableswereformandsquence: thetwofomswere<br />

paper/pencil (P) and amputer-based (C); and the three types of sequence<br />

were: (1) P f011Wed by P, (2) P follmed by c, and (3) c follcweq by P. _.<br />

(Note: Although a fcurth variation of C folluwed by C would have been<br />

desirable, it was a condition of using this population that all persms had<br />

totakeapaper/pencilsumey.. Wejudgedthataskingairm2ntotakethesame<br />

~~~~~timeswouldaffectreliabilityofthedata.)<br />

Test/m-testplxlc&WS were used for cmparing the two survey<br />

achninistrationsandeachrespondentservedashisorher~control. tis<br />

wasnecessaryasno~ioncauldbemadeabcortthecentrdl~~and<br />

dispersion of scorn. EachresporxBnthadauniquepatternofresponseswhi~<br />

describeiihisorherjob.<br />

me three trehnents were:<br />

Time 1 (Tl)<br />

Treatment #l..... P follWe5 by P (P-P)<br />

Treatment #2..... P II 11 c (P-c)<br />

Treatment #3..... c II 81 P (C-P)<br />

~hne2 (T2) Oric&alN<br />

Inte.n~ofelapsedtimebetweenadmini&mtions, anumbrof factorswere<br />

considered. A reviewoftheliteraturegenerallysupportedthedecisionthat<br />

alapseoftwotofaurweeksbetweentimeoneandtime~wlouldbe<br />

acceptable.<br />

!I~E purpcse of treatment #I (P-P) was to prwvide a baseline against whiti<br />

theather~~~~dbeccanparedwithrespecttothevariabilityof<br />

i3?st/re-test. Althcughaperfectmatiwasnotexpected,areasonablyhigh<br />

matchwas anticipated, as jobs seldmchangemch intwoweeks.<br />

~eP-Can;iC-P~~~prwidedccsnparisondatato examine effects due<br />

to form of administration. For exanp?le, the second survey might yield a<br />

hi~~nwlberoftaskssel~thanthefirst,sincethefirstacbninistration<br />

might sensitize imumbents to the nature of their jobs. However, this effect<br />

couldbecon.fouMed fortheP-CandC-Psequ~because itispossiblethat<br />

allCadmini&rationswouldyieldahighern~~b~rof responsesbecaUseC<br />

mspoMentswereforcedtolookateach separatetask,whereaSwiththepaper<br />

versiontheymightaccidentallysmnpasta relevanttasksta~t.<br />

2Mministration and Data Collection<br />

AboutMay24, the traditional P.suveywasdispat&edAirFo rce-wide by<br />

CMSQusingtraditionalmxms andmethodsofdistribution. The first<br />

71<br />

40<br />

20<br />

21<br />

I


.<br />

~~whosesuweyswereretumedto~wereiTlanediatelysentasecond<br />

P administration with an explanatory letter. ByJuly 5, 30 secondreturns<br />

were received and used for analysis.<br />

For this research effort, printed swxeys for the P-C -bEnt were handdelivered<br />

to the Survey control Officers ax Bery&rm Kelly, aMI Randolph<br />

?d?E6. !Ihe cmputerized version was then giv& within'2-4 weeks folluihg the<br />

paper administration. For all ccaqxterized administrations, local Z-248<br />

Ew?sonal~(pcs)wereused. Each xxspotient used a separate disk for<br />

taking the sumey. For the C-P treatmsnt a reverse process wasusedstarting<br />

withE&wksandLacklandAFBs. Allpapsrversionsbiemmachine-scannedby<br />

~tocreateadatafileondisk,~~wasthenma~arrdmergedwiththe . .<br />

datastrings collectedviatheFCs.<br />

Results<br />

Any inteqxetationofthedatamsttah intoacccuntthatthe Chapel<br />

Mamgement Specialw, selected forconveniencemaynotbegenemlly<br />

mqmsentative of all Air Fame jobs. Sewti, themmberofairmen forea&<br />

treatment was sndll. CIhhd, there is no "correct" oridedl selection of<br />

eitherjcbtas~ortherelativethespentratings,hence statisticsthat<br />

relyoncentraltendemy cculdnotbeused. Finally, arnmberofmre<br />

qualitative techniques such as "think aloud" protccols or follow-up<br />

questionnairesthatcouldhaveaddmssed sane of the ~issuesraisedby<br />

the5edatawerenotpossible inthegivmthe frame.<br />

Table 1sumnarizesdataintenrksofthetotal numberoftaskssel~by<br />

all individuals for each administration, the mean number of tasks selected by<br />

ea~person,thepercentoftasksselectedinboththefirstandsecond<br />

administrations, andthemeanchange innuinberoftasks selededper<br />

individual.<br />

Table1<br />

summary of TaskSelectionData<br />

N=30 N=17 N=17<br />

Treatment Pl - P2 Pl - C2 Cl - P2<br />

. .<br />

Total tasks select. 3,684 3,878 1,808 1,950 2,418 2,267<br />

Mean each respond. 123 129 106 115 142 133<br />

Selectedbothadm. 81% 82% 75%<br />

-change . +6.5% +8.4% -8.9% .<br />

. Anotherresearchissuewaswhether, forthejobtasksselectedby<br />

InWmbents, theestimatesof"TimeSpentin PresentJob"wouldvaKyasa<br />

function of type of admhhtmtion. (This isestimatedwithaninepoint<br />

scale:l= "VeKy small amount"). Table2showsthechangeinratingsfrm~e<br />

first administration to the secmd where, for example, for the Pl - P2<br />

administration of 2,988 tasks chosen, 1,283 time ratings were the same, 387<br />

taskswereratedonepointhi~~formoretimespent, 342 wereratedorle<br />

Point lower forlesstime spent, etc.<br />

72


Change<br />

IIf<br />

If<br />

1:<br />

1;<br />

0<br />

tl<br />

t2<br />

t3<br />

t4<br />

t5<br />

t6<br />

t7<br />

t8<br />

N = 30 N = 17<br />

Pl - P2 Pl - c2<br />

# of Tasks<br />

0<br />

:<br />

::<br />

124<br />

239<br />

342<br />

1,285<br />

387<br />

281<br />

146<br />

76<br />

11 81<br />

2,98;<br />

Table 2<br />

Variation in Time Spent Ratings<br />

Change # of Tasks<br />

1; 3<br />

-6 :<br />

1; :A<br />

1; 1:;<br />

-1 560 154<br />

tl 231<br />

t2 137<br />

t3 84<br />

t4 78<br />

t5<br />

t6 i<br />

t7 0<br />

t8 0<br />

1,488<br />

Change<br />

7;<br />

1;<br />

1;<br />

1:<br />

0<br />

tl<br />

t2<br />

t3<br />

t4<br />

t5<br />

t6<br />

t7<br />

t8<br />

cy =-17p2<br />

# of Tasks<br />

x<br />

16<br />

::<br />

113<br />

229<br />

364<br />

657<br />

133<br />

ii;<br />

19<br />

i<br />

i<br />

1,812<br />

We also wanted to examine the data which reflected job tasks chosen for<br />

on administration but not the other, as to whether estimates of "Time<br />

Spent..." varied between the two forms of administration. These data are<br />

displayed in graphic form in Tables 3, 4 and 5.<br />

Table 3<br />

Time Ratings Pl - P2<br />

not selected both<br />

N = 30<br />

Pl not P2 = 696<br />

P2 not Pl = 890<br />

Table 4<br />

Time Ratings Pl - C2<br />

not selected both<br />

Number of Tasks<br />

600(<br />

Y<br />

1 2 3 4 6 6 ? a 9<br />

Rating Scale<br />

Number of Tasks<br />

200<br />

m rrlrctrd pi not p2 6X52 wkoted p2 not pl<br />

Pl not C2 = 320<br />

C2 not Pl = 460<br />

60 /<br />

0<br />

1 2 3 4 6 6 7 8 9<br />

Rating Scale<br />

- WlWted Pl not C2 6i? ukctod ~2 not Pl<br />

73


Table 5<br />

Time Ratings Cl - P2<br />

not selected both<br />

N = 17<br />

Cl not P2 = 606<br />

P2 not Cl = 455<br />

Discussion<br />

Number of Tasks<br />

260 I - - - -<br />

Rating Scale<br />

- _ - -<br />

W aelected cl not ~2 TX wlected ~2 not cl<br />

Perhapsthemcstiq2ortantbenefitsofth.is researcharethat~ncrw<br />

hasaprotoQpeccznputerizedsurvey, andthepotential of autcxnationhasbeen<br />

demonstrated. Ajobsurveywasadministesedwithmirroccanpprter (andcouldbs<br />

distributed electmnically). Ihe data were captured electronically and<br />

analysiswas accq?lished ina fewhours.<br />

Tables 4 and5de1~nstmtethene&foradditional research where there<br />

seems to be a disproportionate rkmber of responses for "1" ard "5". Inforrflal<br />

feedbacksuggeststhatthe instructions for estimating "TimsSpentonPresent<br />

JoWcanbeintfxpreteiiin~~~~thanoneway. Ihisszqgeststhatfurther<br />

studyoftheeffecksofthewordingofthese instructionsiswarrant32d.<br />

IheP-Pbaselinedata, CornparedwithPl-CZ, suggestthatthese foms of<br />

administration are ccgclparable in terns of test/m-test. They both indicate<br />

thattakingtheinventorycauses increasedssnsitivitytoone'sjob interms<br />

ofnumberoftaskschosen. IfsamethingliketheHawthomEffectwas<br />

vting, an incmaseintasksselect&inP forthe C-PtreaWantshouldbe<br />

evident, and is not found. Datathatsupp0rttheconfcundingeffectsof<br />

having to see each item on the coquter are evident. In Table 1, for Pl-C2<br />

andCl-P2,therewasan i,mreze in the mean rmber of items selected of 8.4%<br />

whenusingtheccq~ter forthesecondadministrationandadscredse of 8.9%<br />

--w paper-<br />

Theccq@ervemionalsodemonstrated,onalirn.itedbasis, huwasurvey<br />

can be "bran&&" or f~ppeciff to only display job tasks relevant for a given<br />

personbasedonpriormsponses.Improvedaaxraq'andsignifidanttime<br />

saviqscculdresult. Also, theabilityofthecq+kertopmdessdata<br />

duringtheadministrationofajobsurveycouldalsoresultinnewmethodsand<br />

levelsofmviewbyjob inxnkents.<br />

74<br />

I


Recammenaations<br />

Werecxararrendthat~UsAFbeginirrmrediatelytodevelap~~jab<br />

inventories. ~npartiaikr: (l)aqrehensiveelectmnicnsWrkneedsto<br />

bedesignedthatwillallawaKsQtoelectranicallya~, p-and<br />

archive occupational surveys. (lN?.monnel oonaept III" (E-3) cklm?ntly under<br />

d~~~tbyAirForce<strong>Military</strong>EersoMelcenter,withgatewayS~each<br />

Air ForceconsOlidatedEWe Personnel Office, shcnildbe investigated further):<br />

(2) because the CcBlpxzter offers display, rwiew and reporting capabilities not<br />

available with traditional. paper administration, we strongly recQBnmerd that<br />

planning efforts to use this capability be undertakenas soonaspossible; and<br />

(3) ~isneededtooptimizedesignof~~izedsurveyswhichmi~t _.<br />

im=ludeb~or~ing,differentidLfsedbackbasedonindiviltual<br />

~tt=nsofresponse,Pro=du= for review and correction of responses by<br />

unmbe.nts, arvlothersumey$zsign feakxreswhichareuniquetothe<br />

ccslpnrter-<br />

misreportdescribes 3xsezu&whichmmpamdpaperaMpmcilve.rsus<br />

cxaqmter-baseiadministmtionofa USAFJobInventoxy forthechapel<br />

ManagementspeCialty. !tkst/mMzest administration pmmdures, with each<br />

subjectactingasa~n~l,wereused.~edatashawtherewasa81%match<br />

for Paper (P) followed by P; 82% for P followed by aqute.r (C) and 75% for C<br />

follow& by P. !the data suggest that aqxter-based administration will<br />

impmvetheyieldoftasks chosen, butthatitsusetocollect estimates of<br />

"Time Spent..." ratingsispmblematicwiththe cumentsumey instructions.<br />

Cmputerizing of job surveys is feasible. Additional efforts are need&i to<br />

cr&ze a valid, reliable Air Force wide automated system<br />

75


NPT ENEANCEN.ENTS To TEE<br />

OCCUPATIODlAIi RBSBARCE DATA BAXX:<br />

Joe Menchaca, Jr., Capt, USAF<br />

Jody A. Guthals, 2Lt, USAF<br />

Air Force Human Resources Laboratory (AFHRL/MOD)<br />

Brooks Air Force Base, Texas<br />

Lou Olivier, Glenda Pfeiffer<br />

OAO Corporation<br />

INTRODUCTION<br />

The Occupational Research Data Bank (ORDB) is an on-line, data<br />

repository providing users immediate access to a variety of<br />

occupational information about Air Force specialties (AFS) and the<br />

people who perform duties in them. The combination of several<br />

unique subsystems gives ORDB the ability to retrieve many otherwise<br />

dispersed sets of data from a consolidated data bank. Instead of<br />

the normal laborious and time-consuming task of finding personnel<br />

background information by formal requests to computer data bases,<br />

searching Air Force regulations, or searching a library of<br />

technical reports and previous studies, the ORDB allows the user to<br />

streamline occupational data retrieval by providing easy access to<br />

data from all these sources. Two years ago a paper was presented<br />

discussing some planned enhancements and applications of the ORDB<br />

to assist manpower, personnel, and training (MPT) decision makers<br />

and analysts in the acquisition of Air Force weapon systems<br />

(Longmire and Short 1988). The purpose of this paper is to<br />

describe implementation of these enhancements and discuss some<br />

actual MPT applications.<br />

BACKGROUND/OVERVIEW<br />

Plans for the development of the ORDB began in 1978. While<br />

vast quantities of information were available about Air Force<br />

occupations, the data were widely dispersed among various different<br />

organizations, with many different formats, and degrees of<br />

coverage. At that time, the Air Force Human Resources Laboratory<br />

(AFHRL) maintained 29 different types of computer files from by<br />

many different sources. Also, AFHRL housed Air Force technical<br />

reports dating back to 1943 and was the official Air Force<br />

repository of all occupational study data files generated by the<br />

USAF Occupational Measurement Center (USAFOMC). Other organizations<br />

(HQ USAF, ATC, AFMPC, etc.) had their own data bases and generated<br />

numerous recurring reports, regulations, and studies.<br />

Occupational researchers needed consolidated information that was<br />

easily and rapidly accessible.


The ORDB was designed and continues to reside on the AFHRI,<br />

UNISYS 1100/82 mainframe at Brooks, Texas. The programs within<br />

ORDB were created in a user-friendly, tutorial environment so that<br />

even the most novice of computer users could access its<br />

information. Beyond the original scope of ORDB's development, the<br />

current enhancements to the system focus on ways to make the ystem<br />

more useful to a variety of users such as researchers, OMC<br />

analysts, and MPT managers who determine MPT requirements for<br />

already existing weapon systems and who must forecast similar<br />

requirements early in the planning stages of new weapon system<br />

acquisitions (Longmire and Short, 1988).<br />

The ORDB provides storage and on-line retrieval of a variety<br />

of occupational data within its seven major subsystems. Figure 1<br />

diagrams the ORDB. It also shows each subsystem's primary area of<br />

use. The check marks within the circles indicate new subsystems<br />

which are described beldw.<br />

(1) The CODAP [Comprehensive Occupational Data Analysis Procframs)<br />

Subsvstem allows rapid retrieval of reports from the most recent<br />

occupational study on an AFS.<br />

(2) The Enlisted AFSC Information Subsvstem (EAIS) contains AFSC<br />

descriptions (for ladder and career field), progression ladders,<br />

and prerequisites for the years 1978 to the present, and number<br />

change history (1965 - present).<br />

(3) The Officer AFSC Information Subsvstem (OAIS) allows retrieval<br />

of officer AFSC information similar to that available in the EAIS<br />

(1976 - present).<br />

Support<br />

Weapon System<br />

Officer AFSC<br />

Figure 1. ORDB BUBBYBTEMB<br />

77


(4) The Computer-Assisted Reference Locator (CARL) provides<br />

listings of occupational studies, technical reports, films, and<br />

, other documents related to Air Force jobs.<br />

(5) The Enlisted Statistical Subsvstem (ESS) provides statistical<br />

distributions of selected data elements for enlisted personnel on<br />

the Uniform Airman Record (UAR) file at the end of the calendar<br />

year as well as personnel with records on the Pipeline Management<br />

System (PMS) file (1987 - present).<br />

(6) The Archived Statistics Subsvstem contains pre-generated<br />

statistics on demographic, aptitude, education, training, turnover, __<br />

and duty-related information on Air Force enlisted personnel<br />

previously generated for calendar years 1980-1986. The CY 89 task<br />

phased out pre-generated statistics which are now accessed from<br />

this subsystem.<br />

(7) The Weaoon Svstem Information Subsvstem (WSIS) permits access<br />

and retrieval of Air Force occupational information by weapon<br />

system, special experience identifier (SEX), or AFSC.<br />

The capabilities which ORDB developers are seeking are best<br />

summarized as an up-to-date occupational research data base,<br />

containing a wide variety of both historical and current<br />

information on United States Air Force enlisted and officer career<br />

fields. The CARL subsystem is continually updated as material<br />

becomes available, AFSC descriptions are updated semi-annually,<br />

Occupational Measurement Center study reports are loaded into the<br />

system on a continual basis as soon as an analysis is complete, and<br />

all modifications are documented in the User's Manual and/or<br />

Procedural Guide on a day-to-day basis (Olivier et.al.). Overall,<br />

the conscientious effort to update and maintain the ORDB is the key<br />

to its success.<br />

RECEN!I? ENHANCEMENTS<br />

Presently, research is continuing with the ORDB to facilitate<br />

the planning and analysis of MPT requirements earlier in the weapon<br />

system acquisition process. There are presently two primary areas<br />

of improvement. First was the development of the Weapon System<br />

Information Subsystem (WSIS). A second major enhancement was the<br />

conversion of the Statistical Variable Subsystem from aggregated<br />

occupational statistics to current user-defined population<br />

statistic variables from the UAR and PMS.<br />

As was mentioned earlier, the WSIS allows users to obtain<br />

information cross referenced between a specific Air Force weapon<br />

system, SEIs and enlisted AFSCs or any combination thereof. An<br />

enlisted AFSC is a six character field (i.e. 41131C) including<br />

suffix. Prefixes are not used. Special Experience Identifiers are<br />

three digit numeric codes which identify special experience not<br />

otherwise reflected in the USAF enlisted classification structure.<br />

SEIS are used to achieve greater flexibility in the management of<br />

personnel, particularly in the quick identification of specially<br />

_’<br />

78


qualified resources to support contingency operations or<br />

situations. All SE1 information was derived from AFR 39-1, Airman<br />

Classification and within the WSIS has been matched to appropriate<br />

weapons systems and AFSCs.<br />

The WSIS can retrieve information by calendar year beginning<br />

with a base year of 1987. It allows a user to enter a weapon<br />

system and obtain all the related enlisted AFSCs and SEIs, or vice<br />

versa. Weapon system identification was derived from the 1988 AF<br />

Magazine Almanac with the intention of creating a comprehensive<br />

listing of existing/active USAF Weapon Systems including all<br />

airplanes, helicopters, missiles, etc. The data are arranged by<br />

mission type (i.e. Strategic Bombers, Trainers, Helicopters, etc.)<br />

with the actual weapon systems listed for each mission type.<br />

In the past, statistics on 125 variables were computed each<br />

year against the most current UAR, Airmen Gain/Loss (AGL), and PMS<br />

data files and then uploaded into the system. With the new<br />

Enlisted Statistical Subsystem (ESS), this process has recently<br />

changed for the sake of providing current data as soon as it<br />

becomes available. As was mentioned earlier, the ESS is comprised<br />

of records of all enlisted personnel on the UAR file as of 31 Dee<br />

of each calendar year and all personnel with records on the PMS<br />

file who completed training in that year. Statistical data is<br />

requested by Duty AFSC or PMS Course ID. One and two-way<br />

distributions for selected variables are included in the output.<br />

Where appropriate, means, standard deviations, and row/column<br />

counts are also listed.<br />

The UAR as'of 31 Dee for each year contains all enlisted<br />

personnel to include active duty projected gains, some recent<br />

losses, etc. AFHRL personnel scrub the file so that the resultant<br />

file contains records of all enlisted personnel on "active duty".<br />

Only certain selected fields are used in the ORDB ESS. A total of<br />

44 UAR variables have been selected for the ESS.<br />

The PMS files contain records of all personnel who attended<br />

training at Air Force Technical Schools. For the purpose of the<br />

ESS, only active duty enlisted personnel who have completed<br />

training in the given year are selected. Four variables have been<br />

selected for the ESS to bring the total to 48 variables. The<br />

variables are listed in Table 1. For a two way distribution, one<br />

variable must be marked with an asterisk.<br />

At present, information stored in ORDB is AFSC-specific.<br />

Current modifications to the Weapon System Information Subsystem<br />

(WSIS) and the Enlisted Statistical Subsystem will soon yield<br />

additional information by weapon system. This capability should be<br />

available by the end of CY90. The result will be an improved<br />

occupational research data source containing a wide variety of both<br />

historical and current information on enlisted and officer career<br />

fields of the United States Air Force. There has also been a<br />

recent proposal to place the data base on compact disc, for use on<br />

a write-once-read-many (WORM) drive. A significant increase in the<br />

number of users who could access the system would result.<br />

79


I.<br />

1 Duty AFSC<br />

2*Secondary AFSC<br />

' 3*ASVAB-General<br />

4*Subst. Abuse-Lvl.<br />

5 Primary AFSC Prefix<br />

6 Secondary AFSC Pre.<br />

7 Base of Assignment<br />

8 Major Command<br />

9 SEI-PAFSC-1st<br />

lO*Age-Years<br />

11 SEI-PAFSC-4th<br />

12 Ethnic Group<br />

13*Marital Status<br />

14*PMS Training Length<br />

15 PMS Final Rate<br />

16 PMS Course ID (AFSC)<br />

17 PMS Term. Reason<br />

18 Primary AFSC 35*Number of Dependents<br />

19*ASVAB-Electronic 36*ASVAB-Mechanical<br />

20*ASVAB Admin. 37 Unfavorable Info. File<br />

2lDuty AFSC Prefix 38 Substance Abuse-Type<br />

22*AFQT Score 39*APR-Most Recent<br />

23*Current Grade 40*EPR-Most Recent<br />

24*Time in Grade 41*Current Flying Status<br />

25*TAFMS-Months 42 Current Location<br />

26*Cat. of Enlist. 43 SEI-PAFSC-2nd<br />

27 SEI-PAFSC-3rd 44 Security Clearance<br />

28*Training Status 45 SEI-PAFSC-5th<br />

29 Mental C,ategory 46 Cat. of Enlisted Status<br />

30 Academic Ed. 47*Program Element Code<br />

31*Sex 48*Conus-Overseas<br />

32 Race<br />

33 Prof. <strong>Military</strong> Education<br />

34 <strong>Military</strong> Status of Spouse<br />

Table 1. ES8 VARIABLES<br />

MPT APPLICATIONS<br />

ORDB relates many dispersed sets of data into a consolidated,<br />

rapidly accessible data base. Instead of the normal laborious and<br />

time-consuming task of finding background information by formal<br />

requests to computer data bases, searching Air Force regulations,<br />

or searching a library of technical reports and previous studies,<br />

the ORDB allows users to streamline data retrieval while saving<br />

computer resources. ORDB is valuable for aiding research design,<br />

conducting historical and cross-specialty analyses, and guarding<br />

against duplication of effort and inconsistencies between data<br />

bases. ORDB access facilitates planning and analysis support of<br />

MPT requirements earlier in the weapon system acquisition process.<br />

The WSIS is proving to be helpful to MPT planners and analysts<br />

requiring occupational information by AFSC or total weapon system.<br />

Researchers within AFHRL are primary users of the ORDB. A<br />

recent example of the ORDB's many uses was a CODAP retrieval of all<br />

duty descriptions for certain AFSCs. These descriptions were then<br />

used in the development of a taxonomy determining skill knowledge<br />

and ability to be used in weapon system acquisition to determine<br />

MPT requirements. The ORDB has been identified as a key component<br />

of several high-priority AFHRL research projects. Some of these<br />

are the Training Decisions System (TDS), the Advanced On-the-Job<br />

Training System (AOTS), Job Performance Measurement, and the Basic<br />

Job Skills Project. The advanced On-the-Job Training System (AOTS)<br />

program used the ORDB CODAP subsystem for their initial research.<br />

Future use of the ORDB to support the program at the base level,<br />

called the Base Training System (BTS), is presently being<br />

Considered. The proposed portable WORM drive ORDB would enable<br />

more People to use the ORDB.<br />

The ORDB is a critical resource to projects underway as part<br />

of the MPT Integration effort. Work at ASD/ALH, the Air Force MPT<br />

Directorate, Continues to require use of the ORDB. DOD directive<br />

5000.53 calls for MPT integration early in weapon system<br />

-. I, .>-.w- . .- - ._<br />

8 0


acquisition. ASD/ALH made extensive use of the ORDB in April 1989<br />

when a rapid analysis of MPT and safety factors for the A-16 was<br />

called for. Included in this analysis was a data retrieval of the<br />

target population, maintenance personnel and demographics. The<br />

study's objective was to determine the target maintenance personnel<br />

who were applicable to the A-16 and what their job entails.<br />

There are several other key ORDB users. At the Occupational<br />

Measurement Center, the ORDB is used to provide quick in-depth<br />

orientation to AFSCs and as a rapid response tool to high level<br />

management queries. The Training Performance Data Center (TPDC) in<br />

Orlando, Florida, has benefitted from accessing the system to<br />

obtain prompt, up-to-date data on Air Force specialty structures<br />

which have in turn been made available to a number of DOD agencies.<br />

TPDC researchers will soon be providing the Laboratory with a<br />

process mapping eguipme,nt-to-occupations, which will be a vital<br />

component in the Weapon System Information Subsystems development.<br />

The Air Force Management Engineering Agency (AFMEA) is hoping to<br />

use the ORDB as a cross reference for manpower studies. AFMEA is<br />

conducting special interest studies in support of an Air Staff<br />

requested study to determine which career fields report excessive<br />

man hours. Finally, AFMEA is doing a comparison of skill and<br />

experience to determine changes in the force structure. The ORDB<br />

will provide needed information to find a relative value of<br />

experience in Air Force personnel.<br />

Access to the ORDB by users outside the Laboratory is<br />

available via commercial and DSN telephone lines and through the<br />

Defense Data Network (DDN), a capability which conveniently serves<br />

a number of outside agencies currently having or requesting access.<br />

REFERENCES<br />

Longmire, K. M., and Short, L. 0. (1988, December). The<br />

Occupational Research Data Bank: A Key to MPTS Analysis.<br />

Proceedinas of the 30th Annual Conference of the <strong>Military</strong><br />

Testins <strong>Association</strong> (262-267). Arlington, VA.<br />

Longmire, K. M., and Short, L. 0. (July 1989) Occuoational<br />

Research Data Bank: A Key to MPTS Analysis Support<br />

(AFHRL-TP-88-71). Brooks AFB, TX: Manpower and Personnel<br />

Division, Air Force Human Resources Laboratory.<br />

Olivier, L, Pfeiffer, G., and Menchaca. J. Jr. (January 1990)<br />

Occunational Research Data Bank User's Manual<br />

(AFHRL-TP-89-62). Brooks AFB, TX: Manpower and<br />

Personnel Division, Air Force Human Resources Laboratory.<br />

81<br />

,


ASCII CODAP: PROGRESS REPORT ON APPLICATIONS<br />

OF ADVANCED OCCUPATIONAL ANALYSIS SOFTWARE *<br />

William J. Phalen, Air Force Human Resources Laboratory<br />

Jimmy L. Mitchell, McDonnell Douglas Missile Systems Company<br />

Darryl K. Hand, Metrica, Inc.<br />

Abstract<br />

The development of automated procedures for selecting job and task module types from a<br />

hierarchical clustering solution and the interpretive software associated with these procedures were<br />

reported at the 1987 and 1988 MTA conferences. Over the last two years, operational testing and<br />

evaluation of this software has demonstrated its value in terms of enhanced analytic capabilities and<br />

accelerated completion of the analytic process. This report provides informative examples and<br />

experiences to illustrate how complex analyses have been accomplished by using the job and task<br />

module type selection and interpretation software to extract, organize, and display latent bits of<br />

relevant information from a COCAP database.<br />

Introduction<br />

The principal occupational analysis technology in the United States Air Force is the<br />

Task Inventory/Comprehensive Occupational Data Analysis Programs (CODAP) approach.<br />

This system has supported a major occupational research program within the Air Force<br />

Human Resources Laboratory (AFHRL) since 1962 (Morsh, 1964; Christal, 1974), and an<br />

operational occupational analysis capability within Air Training Command’s USAF<br />

Occupational Measurement Squadron since 1967 (Driskill, Mitchell, & Tartell, 1980;<br />

Weissmuller, Tartell, & Phalen, 1988). The CODAP system is now used by all the U.S. and<br />

many allied military services, as well as a number of other government agencies, academic<br />

institutions, and some private industries (Christal & Weissmuller, 1988; Mitchell, 19SS).<br />

Recently, the CODAP system was rewritten to make it more efficient and to expand its<br />

capabilities (Phalen, Mitchell & Staley, 1987). In the process of developing this new ASCII<br />

CODAP system, several major innovative programs were created to extend the capabilities<br />

of the system for assisting analysts in identifying and interpreting potentially significant jobs<br />

(groups of similar cases) and task modules (groups of co-performed tasks). Initial<br />

operational tests of these automated analysis programs were conducted and preliminary<br />

results were reported at previous conferences (Phalen, Staley, Sr Mitchell, 1988; Mitchell,<br />

Phalen, Haynes, & Hand, 1989).<br />

Over the last two years, operational testing and evaluation of new interpretive software<br />

has continued and these programs have demonstrated their value in terms of enhanced<br />

analytic capabilities and their potential to accelerate completion of an occupational analysis.<br />

Some of these programs have been released into the operational version of ASCII CODAP<br />

while others remain experimental; i.e., they are not yet in final operational form. In this<br />

presentation, we want to provide some examples of this continuing work. Such examples will<br />

also serve to illustrate how complex analyses can be accomplished more expeditiously by<br />

using the job and task module type interpretation software to extract, organize, and displ:ly<br />

latent bits of relevant information from an occupation-specific CODAP database.<br />

* Approved for Public Release; Export Authority 22CFR125.4 (b)(13).<br />

.<br />

--<br />

!<br />

I I<br />

I<br />

I<br />

1


A Suite of Advanced Interpretive Assistance Programs<br />

A set of seven programs has evolved gradually over the last few years which are meant to<br />

assist analysts in interpreting job and task clusters; some of these were completed in time to<br />

be released with the initial version of ASCII CODAP. Others are still being refined and thus<br />

are not yet ready for operational use. It is helpful to have an overview of the entire set of<br />

programs, so everyone can see how the programs relate to one another and to their ultimate<br />

objective. These programs are shown in Figure 1 below.<br />

Identify Appropriate Clusters<br />

IdenNy/Display Core Tasks<br />

IdenIify/I>isplay Core C&es<br />

Relationship of Task Clusters<br />

lo Job Clusters<br />

Case Cl uslers Task Cl usfers<br />

(Job Types) (Task Modules)<br />

JOBTYP MODT YP<br />

I I<br />

CORTAS TASSET<br />

I<br />

CASSE T<br />

I<br />

CORCAS<br />

JOBMOD<br />

Figure 1. The Set of Advanced Interpretive Assistance Programs<br />

(Boldface = operational program in ASCII CODAP; Ifafic = experimental, not yet released).<br />

The operational programs are briefly described as follows:<br />

JOBTYP automatically identifies stages in most branches of a hierarchical clustering<br />

DIAGRM which represent the “best” candidates for job types. First, core task homogeneity,<br />

task discrimination, a group size weight, and a loss in “between” overlap for merging stages<br />

are calculated for all stages and these values are used to compute an initial evaluation value<br />

(for JOBTYP equations, see Haynes, 1989). This value is used to pick three sets of initial<br />

stages; these are then inserted into a super/subgroup matrix for additional pairwise<br />

evaluation, in order to further refine the selection of candidate job type groups Three final<br />

sets of stages (primary, secondary, and tertiary groups) are then reported for the analyst to<br />

use as starting points for selecting final job types.<br />

CORTAS compares a set of group job descriptions (“contextual” groups) in terms of<br />

number of core tasks performed, percent members performing and time spent on each core<br />

case, and the ability of each core task to discriminate each group from all other groups in<br />

the set. It also computes for each group an overall measure of within-group overlap called<br />

the “core task homogeneity index”, an overall measure of between-group difference called the<br />

“index of average core task discrimination per unit of core task homogeneity”, and an<br />

asymmetric measure of the extent to which each group in the set qualifies as a subgroup or<br />

supergroup of every other group in the set.<br />

TASSET compares clusters of tasks (modules) in terms of the degree to which each cluster<br />

of tasks is co-performed with every other task cluster (supergroup/subgroup matrix). Within<br />

each cluster, TASSET computes the average co-performance of each task with every other<br />

task in the cluster (representativeness index) and the difference in average co-performance<br />

of the same tasks with all other task clusters (discrimination index). TASSET also identifies<br />

83


tasks which meet the co-performance criterion for inclusion in clusters in which it was not<br />

placed (potential core tasks), as well as tasks that are highly co-performed with all clusters<br />

except the cluster under consideration (negatively unique tasks).<br />

The experimental programs are as follow:<br />

MODTYP - Just as the JOBTYP program automatically selects from a hierarchical<br />

clustering of cases the “best” set of job types based on similarity of time spent across tasks,<br />

the MODTYP (module typing) program selects from a hierarchical clustering of tasks the<br />

“best” set of task module types based on task co-performance across cases. The term “best”<br />

means that the evaluation algorithm initially optimizes on four criteria simultaneously (i.e.,<br />

within-group homogeneity, between-group discrimination, group size, and drop in “between<br />

overlap” in consecutive stages of the hierarchical clustering). After all stages of the clustering<br />

. have been evaluated on these criteria, primary, secondary, and tertiary sets of mutually<br />

exclusive task clusters are selected as first-, second-, and third-best representations of the<br />

modular structure of the hierarchical clustering solution. The three sets of groups are then<br />

input to another evaluation algorithm which computes super- and subgroup indices between<br />

all pairs of groups in the primary solution within the same TPath range. Based on the<br />

combined results of both evaluations, the sets of groups are revised. The final set of primary<br />

groups is input to the TASSET and CORCAS programs to provide analytic and interpretive<br />

data for each primary cluster of tasks. MODTYP output also reports the initial and final sets<br />

of primary, secondary, and tertiary groups and their evaluation indices.<br />

In addition to the data summaries of groups noted above, which can be very complex, a<br />

graph of all final stages in TPATH sequence is generated to help the analyst understand the<br />

relationship among the possible levels of clustering. An example of such a graph is shown<br />

in Figure 2. In this case, Level 1 = primary group; Level 2 = secondary group; and Level<br />

3 = tertiary group. By showing a different symbol for each level, the graph highlights the<br />

most likely choices of groupings (task modules) for the analyst’s consideration. Used in<br />

conjunction with the Task Cluster Diagram, this display provides a quick way for analysts to<br />

make preliminary judgments as to the appropriate groups to select for further evaluation.<br />

hIODTYP MODULE TYPING TEST RUN R-l Avionics Test Station, AFS 451X7 Page 13<br />

Graph of All Final Stages in TPATH Sequence<br />

1 289 578 867 1157<br />

Stage TPATH Range Level + ____________--__---_ + ------------------- ---- + ___________-___-----___ + --------------<br />

321 l- 72 2 __--__<br />

384 l- 33 1 Xx<br />

461 36 - 37 1 X<br />

575 38- 40 1 X<br />

766 42 - 43 1 X<br />

342 46- 72 1 XxXx<br />

333 l- 43 3 . . .<br />

362 46- 63 3 . . .<br />

401 64 - 71 3 . .<br />

365 73 - 84 3 . .<br />

469 73- 76 1 X<br />

Figure 2. Example MODTYP Graph of All Final Stages in TPATH Sequence (AFS 45’1x7)<br />

84<br />

.


CORCAS - The CORCAS report characterizes task clusters selected by the analyst for<br />

further evaluation in terms of the people who most perform it, and especially those principal<br />

performers whose jobs are concentrated in this task cluster to the exclusion of all or most<br />

other task clusters. The CORCAS report may contain any type of background variable<br />

information describing a case that will fit in the allocated space, just as on a PRTVAR<br />

report; however, “base of assignment” and “job title” are often the most useful variables. An<br />

example is shown in Figure 3 below.<br />

CO RCAS CORE CASES FOR TASK MODULES<br />

Summary Statistics for Target Module ST0046<br />

CSOOOI Slap 41: PSOOOl 435 to 437<br />

Page 82<br />

Description Value Description Value<br />

Number of tasks in target module 3 Average number of tasks performed by all cases .OS<br />

Number of core cases in target module 11 Average number of tasks performed by core cases 1.82<br />

Percent of module time covered by core cases ,~ 70.11 Average percent time spent in module by all cases .OS<br />

Core case homogeneity index (CCMI) 35.48 Avcragc percent time spent in module by core cases 1.29<br />

Co-performance Task Title<br />

22.81 G 208 Evaluate water survival performances of students not wearing pressure suit assemblies<br />

18.40 K 302 Perform minor repairs of life rafts, such as patching or replacing spray shields<br />

27.88 K 319 Store life rafts<br />

Case Level Statistics for Target Module STOW5 (n = 3)<br />

Core Cases Sorted on Average Task Importance Values<br />

Sorted<br />

Avcrap Number Percent Percent<br />

Number Task of Core of Tasks Time in Performance<br />

KPATII Grade DAI’SC Supvsd. Base Job Title Importance Modules Performed Module Emphasis<br />

146 Es 91150 05 Mather NCOIC Admin 78.52 9 100.00 1.99 117.24<br />

41 I3 91150 ocl Mather Arspc Physlgy 59.71 15 66.67 1.30 67.49<br />

40 I3 91130 02 Mather Ar. Phy. Spec 56.38 7 66.67 1.65 58.84<br />

13 I35 91170 01 Brooks Supv Aero Phy 53.47 26 100.00 54 54.08<br />

202 E3 91150 00 Mather Arspc Phy Spec 52.78 5 66.67 1.22 19.51<br />

139 Es 91150 04 Mather Asst NCOIC Acad. 42.52 9 66.67 1.40 40.07<br />

Figure 3. Example CORCAS Report Showing Types of Data Which Can Be Displayed<br />

This example illustrates how the program can be useful in interpreting task clusters; in this<br />

case note that almost all cases are individuals assigned to Mather AFB, CA, where the Air<br />

Force conducts its navigator training, By assessing these data in conjunction with the three<br />

tasks in the module, an analyst can begin to make sense out of the tentative task module.<br />

The CORCAS report makes it apparent that the three tasks are co-performed because they<br />

are all a part of the navigator training course at Mather AFB. Note also that the KPATH<br />

number for each case is also shown; this means that by crossmapping KPATH sequences and<br />

analyst-assigned job type names, we could also display the job type for each member (but<br />

would have to sacrifice some other data in order to have room in the display). We have<br />

done this experimentally and found it very useful; in some cases, it leads the analyst to<br />

reconsider the job type names initially assigned.<br />

CASSET - Whereas CORCAS characterizes a task cluster (module) in terms of those cases<br />

whose jobs are most representative of the task module, the CASSET program generates<br />

displays of cases whose jobs are most representative of the job types (group of cases) within<br />

a given set of job clusters. This approach permits an analyst to quickly characterize a job<br />

85


type by the salient features of its most representative and discriminating members. Like<br />

CORCAS, the CASSET report may contain any type of background variable information<br />

describing a case that will fit in the allocated space, just as on a PRTVAR report, with “base<br />

of assignment” and “job title” often being the most useful variables to aid analysts’<br />

interpretations.<br />

JOBMOD - The JOBMOD (Job Type versus Task Module mapping) program aggregates<br />

the case- and task-level indices computed by the four advanced analysis programs and uses<br />

these aggregate measures to relate task clusters to job types and vice versa. The description<br />

of job types by a handful of discriminant clusters of tasks, and the association of each task<br />

cluster with the types of jobs of which it is an important component, is a basic requirement<br />

for defining and integrating the MPT components of an existing or potential Air’ Force<br />

specialty or weapons system. If AFSs are to be collapsed or shredded out, or new jobs are<br />

to be assigned to an occupational area, or old jobs are to be moved to another occupational<br />

area, such highly summarized, yet meaningfully discriminant hard data are essential (Phalen,<br />

Staley, & Mitchell, 1989:4-5).<br />

Within a specialty being studied, a JOBMOD printout is generated for each job group<br />

showing the relationships of the set of task modules to the cases representative of the job.<br />

An example of such a printout is given below:<br />

JOBMOD ANALYSIS OF TASK MODULES WITHIN A JOB GROUP<br />

ST0035 Centrifuge Operators (n= 5)<br />

01 = Number of tasks in module<br />

02 = Average Percent Members Performing (PMP) within group performing task within the nmdute<br />

03 = Average sum of Percent Time Spent (PTS) for group performing tasks within the module<br />

04 = Percent of most time-consuming task s’time covered by tasks in module<br />

OS = Percent of tasks in module which are core tasks for the group<br />

06 = Percent of the group’s core tasks which are in the module<br />

07 = Percent of tasks in module which are discriminating or unique for the group<br />

08 = Percent of group’s discriminating or unique tasks which are in the module<br />

Module Description 01 02 03 04 05<br />

GPO001 Hypobaric Chamber Operations 28 19.29 8.28 15.02 .OO<br />

GPO002 Classroom Instruction 18 3.33 1.24 2.84 .OO<br />

GPO003 Emergency Escape & Survival 12 1.67 .11 .31 .oo<br />

GPO004 Parachute Familiarization 20 .oo .oo .oo .oo<br />

06 07<br />

.oo .oo<br />

.oo .oo<br />

.oo .oo<br />

.oo .oo<br />

GPO022 Centrifuge Operations 22 69.09 42.58 87.88 54.55 66.67 100.00<br />

GPO023 Research Chamber Operations 42 8.57 6.69 9.88 2.38 5.56 33.33<br />

GPO024 TU-103 Training 6 .oo .oo .oo .oo .oo .OO<br />

Page 17<br />

GPO035 General Tasks 3 33.33 1.10 10.40 33.33 5.56 -00 .OO<br />

Figure 4. Example JOBMOD Report Showing Summary Relationships of Task Mod&s to a Job Group<br />

08<br />

.oo<br />

.oo<br />

.oo<br />

.oo<br />

13.10<br />

8.33<br />

.oo


Discussion<br />

The advanced analysis assistance programs outlined here represent a substantial advance<br />

in the automation of COD@ analysis, aimed at permitting the occupational analyst to focus<br />

attention on making critical judgments, rather than spending hours and hours examining<br />

various case data or task data summaries in an attempt to develop an overall perspective on<br />

a specialty or occupational area. By using somewhat standardized displays which focus on<br />

possible job types or task clusters (modules) and defining relationships within and between<br />

given sets of jobs or modules, these programs permit an analyst to quickly decide what the<br />

potentially meaningful clusters are, and to proceed with other aspects of the analysis.<br />

There still remains some work to be done in terms of polishing the three still-experimental<br />

programs. After they are refined and finalized through additional operational testing, they<br />

will be released into the operational ASCII CODAP system, and will become available for<br />

implementation in military occupational analysis programs. Suggestions for additional<br />

analysis assistance programs which might be needed and useful are also welcome.<br />

References<br />

Christal, R.E. (1974). The United States Air Force occunational research proiect (AFHRL-TR-73-75, AD-774<br />

574). Lackland AFB, TX: Occupational Research Division, Air Force Human Resources Laboratory.<br />

Christal, R.E., & Wcissmuller, J.J. (1988). Job-task inventory analysis. In S. Gael (Ed), Job analvsis handbook<br />

for business, industrv, and government. New York: John Wiley and Sons, Inc. (Chapter 9.3).<br />

Driskill, W. E., Mitchell, J.L., & Tartell, J.E. (1980, October). The Air Force occupational analysis program -<br />

a changing technology. Proceedings of the 22nd Annual Conference of the Militarv <strong>Testing</strong> <strong>Association</strong>. Toronto,<br />

Ontario, Canada: Canadian Forces Personnel Applied Rescarch Unit.<br />

Haynes, W.R. (1989, January). JOB-TYPING, Job-typing programs. In: Comnrehensive Occunational Data<br />

Analvsis Proerams. San Antonio, TX: Analytic Systems Group, The MAXIMA Corporation, Prepared for the<br />

Air Force Human Resources Laboratory [program documentation available on the AFHRL Unisys computer].<br />

Mitchell, J.L. (1988). History of job analysis in military organizations. In S. Gael (Ed), Job analysis handbook<br />

for business. industrv. and government. New York: John Wiley and Sons, Inc. (Chapter 1.3).<br />

Mitchell, J.L., Phalen, W.J., Haynes, W.R., & Hand, D.K. (1989, October). Operational testing of ASCII CODAP<br />

job and task clustering methodologieS (AFHRL-TP-88-74). Brooks AFB, TX: Manpower and Personnel Division,<br />

Air Force Human Resources Laboratory.<br />

Morsh, J.E. (1964). Job analysis in the United States Air Force. Personnel Psvcholog, 17, 7-17.<br />

Phalcn, W.J., Staley, M.R., Sr Mitchell, J.L. (1987, May). New ASCII CODAP programs and products for<br />

interpreting hierarchical and nonhierarchical clusters. Proceedines of the Sixth <strong>International</strong> Occunational<br />

Analvsts’ Workshop. San Antonio, TX: USAF Occupational Measurcmcnt Center.<br />

Phalen, W:J., Staley, M.R., & Mitchcil, J.L. (1988, Dcccmber). ASCII CODAP programs for selecting and<br />

interpreting job and task clusters. Proceedings oft he 30th Annual Conference of the Militarv Testine <strong>Association</strong>,<br />

Arlington, Virginia: U.S. Army Research Institute.<br />

Weissmuller, J.J., Tartell, J.E., & Phalen, W.J. (1988, December). Introduction to operational ASCII CODAP:<br />

An overview. Proceedings of the 30th Annual Conference of the Militarv <strong>Testing</strong> <strong>Association</strong>. Arlington, VA:<br />

U.S.Army Research Institute.<br />

87


-1-b---- .--. -__<br />

----. .~<br />

PROFESSIONEL SUCCESS OF FORMER OFFICERS IN CIVILIAN OCCUPATIONS<br />

Paul Klein<br />

Studying at Federal Armed Forces Universities<br />

Owing to the fact that the recruitment of personnel for military service was<br />

becoming increasingly difficult, the Federal Minister of Defense set up ,a commission<br />

to reorganize education and training in the Federal Armed Forces. In<br />

mid-1971, the commission presented a report suggesting, among other things, the<br />

reorganization of education and training for officers. In doing so, the conunission<br />

proceeded on the assumption that only by providing a system of education<br />

and training which - besides military requirements - also “considers to an increasing<br />

extent the soldiers’ individual interests regarding further education<br />

could we expect the Federal Armed Forces to become more attractive for volunteers,<br />

resulting in an increasing number of applicants” (Lippert/Zabel 1977,<br />

page 52).<br />

With respect to officer education and training, this statement consequently led<br />

up to the introduction of an academic course of studies as part of the officer<br />

education program. On the one.hand, this course of study was to facilitate the<br />

transition to civilian occupations for temporary-career volunteers after leaving<br />

the armed forces, thus making this type of career again more attractive for<br />

volunteers. On the other hand, it was also expected to be of benefit to all<br />

officers in the course of their service, in particular to regular officers with<br />

staff assignments, as the commission assumed that “the functions of officers in<br />

the fields of leadership, organization, training, and their responsibilities<br />

towards their subordinates today make different demands on them than they did in<br />

the past, and that these demands can hardly be met by a system of officer education<br />

and training which emphasizes a rather practical approach, i.e., passing<br />

on experience previously gained” (Ellwein et. al. 1974, page 12). Finally, the<br />

course of studies was to provide an alternative for regular officers who - for<br />

whatever reasons - might decide to correct their original choice of occupation.<br />

For pragmatical, economical, and academical reasons the commission suggested<br />

that the Federal Armed Forces should establish their own universities. Let tures<br />

at these universities commenced on 1 October 1973 in Munich and Hamburg.<br />

For all officers with an extended period of enlistment, studying at one of the<br />

two Federal Armed Forces universities is an obligatory part of their education.<br />

An officer has three and a half years to complete his course of study. To make<br />

maximum use of this study period, which is rather short as compared with courses<br />

at civilian universities, studies are based on the trimester system.<br />

When the universities opened in 1973, the courses offered in Hamburg and Munich<br />

included mechanical engineering, electronics , economics and managerial science<br />

as well as pedagogics, with additional courses provided in Munich in the f ieids<br />

of aerospace engineering, civil engineering including geodesy, and computer<br />

science. Additional courses have been added in the meantime.<br />

88


From a technical point of view, there are no major differences between the<br />

courses of studies provided at the two Federal Armed Forces universities and the<br />

corresponding courses provided at civilian universities. Just like there,<br />

studies are completed by taking the diploma examination. Students who pass the<br />

examination successfully are conferred an academic degree, such as “Diplomingenieur”<br />

.<br />

Concept and Conduct of the Study-<br />

In 1983 the Federal Minister of Defense assigned to the Federal Armed Forces<br />

Institute of Social Sciences the task of conducting a study on the opportunities<br />

and problems involved in the transition of officers with extended periods of<br />

enlistment to the civilian working life. The study was to consider both officers<br />

who completed a course of studies at the Federal Armed Forces universities and<br />

temporary-career volunteers with an extended period of enlistment who left the<br />

Federal Armed Forces at the end of their service without an academic education,<br />

as well as jet pilots and weapons system operators who retired from service when<br />

reaching the age of 41.<br />

The study was designed as what is called a panel survey, i.e.; all the officers<br />

with an extended period of enlistment who left the armed forces in 1984 and 1985<br />

were questioned for the first time when retiring from active service, and a<br />

second time three and a half years later using a standardized questionnaire. The<br />

results given now in this presentation have been obtained from the second survey<br />

among the retired officers. This second survey was conducted in late 1987<br />

through early 1989 and included almost 60 % of all temporary-career volunteers<br />

with a twelve years period of enlistment, as well as the pilots who left the<br />

Federal Armed Forces in 1984 and 1985.<br />

Results<br />

If taking for granted that the results obtained are representative of officers<br />

with an extended period of enlistment who have retired from active service, we<br />

may say that those who graduated from a Federal Armed Forces university have<br />

managed to become integrated into the civilian working life.<br />

. Using the answers obtained from the officers as a basis, we may permit ourselves<br />

to state that the majority of university graduates had no major difficulties in<br />

adapting themselves to the demands made in the civilian sphere, and that they<br />

were successful in their subsequent civilian careers. Except for pedagogics it<br />

is obvious that all courses of studies provided at the Federal Armed Forces universities<br />

may well be said to pave the way also for a civilian career.<br />

Owing to the fact that the situation on the labor market was extremely unfavorable,<br />

it was difficult for those who graduated in pedagogics to find a civilian<br />

occupation closely related to their field of studies in the Federal Republic<br />

of Germany. The fact that they nevertheless managed to find occupations, even<br />

though many times outside the field of pedagogics, testifies to the flexibility<br />

and adaptability of these officers.<br />

89


Some three and a half years after leaving the Federal Armed Forces 80.8 % of the<br />

420 university graduates who had been questioned were employed, 1.9 % were<br />

trainees, and 0.2 % were out of work. Of those employed, 72.1 % worked for private<br />

enterprises and 18.6 % had joined the civil service. 7.7 % had set up on<br />

their own by that time, or worked freelance.<br />

There were clear differences with regard to satisfaction on the job, the income<br />

situation, and career prospects between officers who had decided to work for a<br />

private enterprise and those who had decided in favor of the civil service.<br />

Generally speaking, it may be said that those who chose the “safe” way of a<br />

civil service career - possibly because thinking along the lines of job security<br />

and shying away from taking risks - had to pay the price by having to put up<br />

with limited perspectives regarding income and promotion.<br />

Since in the Federal Republic of Germany the pay grades of officers are comparable<br />

to the pay grades of’other civil service members we were able to find<br />

out that the majority of the officers who decided in favor of a civil service<br />

career had not achieved a higher-ranking position as compared with the last<br />

position they held in their military career. Correspondingly, the same may be<br />

said of their financial situation.<br />

Those questioned who had opted for private enterprise revealed a quite different<br />

development. A mere 7.1 % of them stated that they earned less now than they had<br />

in their last assignment in the armed fores, as opposed to 80.0 % who said that<br />

their income had increased slightly or even considerably. Particularly those<br />

working in the field of engineering regarded their financial position to be<br />

quite favorable. More than 75 % of them pointed out that their salary was now<br />

considerably higher than the pay they had received as officers. ~11 of the<br />

computer scientists who were questioned said that their income had increased<br />

considerably.<br />

Of the university graduates questioned, 84.4 % were largely content with their<br />

civilian occupations and career prospects. Their expectations, so they said, had<br />

been met. 12.3 % said their satisfaction was limited and their expectations had<br />

not come true in many cases. Among those who spoke of limited satisfaction and<br />

disappointment on the job was a relatively large number of those who had studied<br />

pedagogics but also two of the seven computer scientists. Typically enough, only<br />

a few of the “disappointed” ones were employed with private enterprises; most of<br />

them had joined the civil service, with many of them using a way of access which<br />

the legislature primarily provided for non-commissioned officers retired from<br />

active service. (See Table 1.)<br />

The positive assessment of military service resulted in more than half of the<br />

graduates saying that they would go the same way again, if they had to decide<br />

once more. (See Table 2. )<br />

Dropouts and temporary-career volunteers or regular officers without a university<br />

education had to, cope with a much more difficult transition to a civilian<br />

occupation than had their graduate counterparts. Since, as a rule, they lacked<br />

training in a civilian occupation only some of them managed to find adequate<br />

civilian employment immediately upon leaving the armed forces. only 25 % of the<br />

dropouts questioned and merely 10 % of the temporary-career volunteers and regular<br />

officers without a university education found some civilian employment<br />

immediately after leaving the armed forces. The considerable number who did not<br />

have to undergo some sort of vocational training, either as in-plant trainees or<br />

- . _.....v ..,_. ._ .__^,<br />

90<br />

.


y attending schools, to meet the requirements for employment in the civilian<br />

sphere. This was not easy for them, in particular if long-term training was<br />

required. In those cases, financial bottlenecks and austerity became almost<br />

inevitable attendant circumstances, the more so since in not only a few cases<br />

they had to go through a period of unemployment - even if only brief in most<br />

cases - before starting their training period.<br />

Assessment<br />

I am very content. Without any exceptions<br />

my expectations and hopes have been<br />

fulfilled.<br />

I am content. My expectations and hopes<br />

have mainly been fulfilled.<br />

I am not content. Many of my expectations<br />

and hopes have not been fulfilled.<br />

I am very discontent. None of my expecta-<br />

tions and hopes have been fulfilled.<br />

I cannot answer the question.<br />

Table 1<br />

Overall Assessment of <strong>Military</strong> Service<br />

Made by Officers Who Graduated From University<br />

Pedagogics Economics<br />

2<br />

Number (%) of Answers Received<br />

’<br />

Engineering Computer<br />

Subjects Science<br />

( 3.6) (2:61 1<br />

40 141 3<br />

(71.4) (82.4) (42.9)<br />

11<br />

13<br />

(19.6) ( 7.5) (28!6,<br />

-<br />

3<br />

( 5.4) ( 440) ( 243,<br />

- -<br />

Number 56 100 171 7<br />

Options for Answering<br />

I would enlist again for the same number<br />

of years.<br />

I would enlist again for a shorter period<br />

of time.<br />

I would not join the Federal Armed Forces<br />

again under any circumstances.<br />

I have not thought about this yet.<br />

OR: I cannot say yet.<br />

Number<br />

Table 2<br />

The Inclination of Officers Graduated From University<br />

Towards (Hypothetical) First Enlistment<br />

Pedagogics<br />

(2:!8,<br />

3<br />

Number 1%) of Answers Received<br />

Economics and<br />

Managerial SC.<br />

18 20 -<br />

(18.0) (11.6)<br />

12 17<br />

(12.0) ( 9.9)<br />

-<br />

!<br />

!<br />

I<br />

I<br />

3 i<br />

(50.01 /<br />

( 5.5)<br />

55 100 171 6<br />

I<br />

---j<br />

j<br />

I<br />

91


-.. v..-.----_---..--_.__.-_<br />

-_yy ..-.... - ..~.<br />

Without training in a civilian’occupation the chances on the labor market were<br />

small for dropouts and temporary-career volunteers without an academic education.<br />

Those who underwent vocational training had no major difficulties in<br />

being integrated afterwards. The question as to whether this is also tr;Erczf<br />

officers who took up academic studies only after they had left the armed<br />

cannot be answered at present, as they have not yet completed the respective<br />

courses of studies.<br />

Owing to the difficulties experienced in the transition to civilian life, dropouts<br />

assessed the time they spent in the military less favorably than did university<br />

graduates: however, temporary-career officers without an academic education<br />

seemed to be quite content with their military service.<br />

Assessment<br />

Table 3<br />

Overall Assessment of <strong>Military</strong> Service<br />

Made By Dropouts and Temporary-Career Volunteers Without University Education<br />

I am very content. Without any exceptions<br />

my expectations and hopes have been<br />

fulfilled.<br />

I am content. My expectations and hopes<br />

have mainly been fulfilled.<br />

I am not content. Many of my expectations<br />

and hopes have not been fulfilled.<br />

I am very discontent. None of my expectations<br />

and hopes have been fulfilled.<br />

I cannot answer the question.<br />

Number<br />

Temporary-Career Volunteers<br />

w/o Univ. Education<br />

3 ( 3.4)<br />

(’ 75.0)<br />

17.0)<br />

2 ( 2.3)<br />

2 ( 2.3)<br />

88<br />

Number 1%) of Answers<br />

-<br />

Dropouts<br />

2 ( 3.4)<br />

28 (47.5)<br />

15 (25.4)<br />

14 (23.7)<br />

Among pilots there was a relatively high degree of discontent with regard to the<br />

civilian occupations they held three and a half years after retiring from active<br />

duty. This was primarily owing to the fact that these officers had attained high<br />

ranks before leaving the armed forces and had based their expectations on what<br />

they had achieved. If these expectations should remain unchanged it will be very<br />

difficult to remedy their situation. This applies in particular to those officers<br />

who not only expect their civilian salary to correspond to the rank they<br />

had attained (in most cases lieutenant colonel) but also try to continue their<br />

career as civilian pilots.<br />

59<br />

I<br />

/<br />

)<br />

i<br />

,<br />

I<br />

I<br />

-


Literature<br />

------. .-- -.~---.. -- .-. .--. .-- -. -__ -.-<br />

_ _ .<br />

- ..-. WC<br />

Bildungskommission beim Bundesminister der Verteidigung (Ed.): Neuordnung der<br />

Ausbildung und Bildung in der Bundeswehr, Bonn 1971.<br />

Ellwein, Th./Miiller, A.v./Plander, H. (Ed.): Hochschule der Bundeswehr zwischen<br />

Ausbildungs- und Hochschulreform, Opladen 1974.<br />

Hitpass, J./Mock, A.: Das Image der Universitbten, Diisseldorf 1972.<br />

Klein, P.: Der irbergang ldngerdienender Zeitoffiziere in das zivile Berufsleben,<br />

Miinchen 19 84.<br />

Klein, P.: Die Bewdhrung ehemaliger Offiziere der Bundeswehr im Zivilberuf,<br />

Miinchen 1987. i<br />

Klein, P.: Truppendiensttauglich? Zur Bewdhrung von Absolventen der Bundeswehruniversitaten<br />

in der Truppe, in: W.R. Vogt (Ed. 1 : Militar als Lebenswelt,<br />

Leverkusen 1988, S. 241-250.<br />

Lippert, E./Zabel, R.: Bildungsreform und Offizierkorps, in: Sozialwissenschaftliches<br />

Institut der Bundeswehr, Berichte H.3, Miinchen 1977, S. 49-156.<br />

93


A MILITARY OCCUPATIONAL SPECIALTY (MOS)<br />

RESEARCH AND DEVELOPMENT PROGRAM: GOALS AND STATUS<br />

Dorothy L. Finley and William J. York, Jr.<br />

U.S. Army Research Institute Field Unit<br />

Fort Gordon, Georgia<br />

Threat, force modernization, doctrine, and force structure<br />

often change in ways which influence what is required with .<br />

respect to soldier performance. Responses to changes in soldier<br />

performance requirements to assure adequate operation and<br />

maintenance of the Army's inventory of systems often include<br />

changes in MOS and CMF designs. These changes are, in this<br />

program, called MOS restructuring and is the focus of the<br />

program. MOS restructuring is defined as the addition or<br />

deletion of tasks to an existing MOS, the merger or deletion of<br />

MOSS, or the assignment of tasks to a new MOS.<br />

The Army is faced with enlarging and more varied inventories<br />

of equipment (older equipments often cannot be disposed of due to<br />

the insufficient numbers of new equipments), reduced manpower<br />

ceilings, and a reduced and changing manpower pool. The<br />

decisions made about MOS and Career Management Field (CMF)<br />

restructuring determine the number of soldiers in units versus in<br />

training (given manpower ceilings), the number of operators and<br />

maintainers needed to staff the equipments, the design of the<br />

training system, and the levels of aptitudes required. Analyses<br />

have demonstrated that what appears to be the best MOS<br />

restructuring option with respect to one of these factors may be<br />

a very bad option with respect to the other factors. The goal of<br />

this program is to develop decision aids to facilitate the<br />

identification of optimal, not suboptimal, MOS restructuring<br />

solutions with respect to manpower, personnel, and training<br />

resource considerations, and the requirements for unit<br />

performance.<br />

There are several considerations and constraints involved in<br />

any action to restructure MOSS. A fundamental concern is task<br />

and equipment commonalities and differences. One does not want<br />

to assign a set of tasks to a soldier which are so different and<br />

numerous as to impose a too large training requirement or<br />

require too high a level of too many different aptitudes. One<br />

does, on the other hand, want to assign a sufficiently large<br />

number of tasks such the soldier will be fully employed and can<br />

be flexibly assigned. This concern must be considered within the<br />

Contexts of both requirements and constraints. Requirements<br />

include such items as aptitude and gender job requirements,<br />

manpower utilization and training requirements, and the need for<br />

career progression opportunities. The constraints include such<br />

items as manpower pool characteristics and size, manpower<br />

ceilings, available training resources, geographical and


organizational distribution of the equipments, and the size of<br />

the MOS and relative percentages of soldiers across the grade<br />

levels. Overall, MOS restructuring can be summarized as a<br />

complex, multi-dimensional decision. The considerations, and<br />

constraints versus requirements, relate to at least training<br />

impacts, personnel characteristics, force structure, equipment<br />

design, personnel resources, manpower resources, and task<br />

structure.<br />

As noted above, the program objective is to develop aids to<br />

facilitate MOS and CMF restructuring decisions regarding such<br />

questions as: Is restructuring needed at all? Should a new MOS<br />

be created? Should MOSS be merged? Is an overall redesign of<br />

the branch MOSS and CMFs needed? Whatever is done impacts<br />

directly on the branch training system design. The addition or<br />

deletion of tasks which require training imposes a requirement to<br />

modify the training system to accomodate those changes.<br />

Program Overview<br />

The current formulation of the program is presented Figure<br />

1. Work has been accomplished or is projected for the near<br />

future on: The Army Authorization Domentation System (TAADS) (a<br />

manpower data base), and personnel and training data bases; the<br />

ability, equipment, and task domains; and trade-off algorithms.<br />

Recent.accomplishments with respect to the TAADS data base, and<br />

the ability and equipment domains will be described in the next<br />

section.<br />

As depicted in Figure 1, the intent is to provide the<br />

analyst with the tools needed to identify desireable MOS<br />

restructuring possibilities, and to consider these withing<br />

manpower, personnel, and training resource constraints: and then<br />

to provide the means to do tradeoffs between the alternatives<br />

with respect to manpower, personnel, and training impacts. In<br />

Figure 1, under "Trade-Off Algorithms", both operations-based and<br />

requirements-based are noted. Operations-based analyses are<br />

those performed, in the Army, by the Personnel Proponent as the<br />

basis for preparing the paperwork which will actually cause a MOS<br />

restructure action to be implemented. These analyses are<br />

sometimes triggered by the outcomes of requirements-based<br />

analyses. Requirements-based analyses often take place when<br />

there is a major change in equipment inventories, doctrine,<br />

organization, or force structure. These requirements-based<br />

analyses tend to be performed by the combat developers in<br />

coordination with the training developers and personnel<br />

proponents.<br />

95


TAADS and PMAD Data Base<br />

Personnel and Training Data Base<br />

Ability Domains<br />

Equipment Domains<br />

Task Domains<br />

\<br />

Cunent and Pfojected Position &<br />

Manpower, Personne/, and Tmhing Resources<br />

Operations-Based<br />

Optimum<br />

+ Manpower,<br />

Personnel,<br />

and Training<br />

Alternatives<br />

Fisure 1. Overview of the MOS restructuring program to develop<br />

decision aids.<br />

Ecuinment Domains<br />

Recent Accomplishments<br />

Equipment domains are defined as groupings of equipments<br />

based on their similarities with respect to equipment<br />

descriptors. Human factors specialists dealing with the<br />

development of new systems have always defined tasks, ability<br />

requirements, etc. in terms of the design of that new item of<br />

equipment. Many MOSS, however, deal with many systems and, when<br />

a new item of equipment is entered into the inventory then one<br />

must consider.inventory groupings in making MOS assignment or<br />

restructuring decisions. The identification of appropriate<br />

descriptors began as a part of assisting the Signal Branch<br />

Personnel Proponent in developing a training strategy to Support<br />

the merger of three MOSS with two of them becoming an Additional<br />

Skill Identifier (ASI) to the merged MOS. After investigation it<br />

was determined that, in terms of equipment commonalities, only<br />

one of the MOSS should be assigned an ASI. This finding resulted<br />

96


in training cost savings, a reduced training attrition rate,<br />

improved position fill capability, and increased potential to the<br />

soldiers for promotion. Drawing upon this research, an initial<br />

Equipment Domains Assessment Procedure (EDAP) has been developed<br />

which shows promise for identifying equipment domains appropriate<br />

for operators. Identifying equipment domains appropriate for<br />

maintainers is a more complex problem and will take further<br />

research.<br />

Ability Domains<br />

The Army Research Institute Fort Huachuca Field Unit has.<br />

refined the Job Abilities Assessment System (JAAS) and added a<br />

part C, in addition to existing parts A and B, specific to<br />

military intelligence., Mr. York will present a paper at this<br />

meeting describing the application of the refined JAAS, parts A<br />

and B, to MOSS in the Signal Branch. I am going to describe<br />

their application to intelligence MOSS to derive ability<br />

requirements profiles that can be compared to assess the<br />

reasonableness of MOS assignment to a new system. It is of<br />

interest to us because the profiles are derived through analysis<br />

of the tasks assigned to the soldier and, therefore, provides a<br />

means of appraising whether the restructuring proposed, i.e., the<br />

reassignment of tasks to MOSS, creates too great a demand on<br />

ability requirements.<br />

JAAS consists of a taxonomy of 50 abilities (e.g., dynamic<br />

strength, written expression) which, for presentation purposes,<br />

are often grouped into eight clusters (e.g., gross motor skills,<br />

communication skills) and a set of procedures for making scalar<br />

judgments regarding the level of each of the 50 abilities<br />

required to perform a set of tasks. This technique was used to<br />

develop ability profiles for several intelligence MOSS and to<br />

appraise the ability requirements for a new intelligence system.<br />

It was determined that some of the intelligence MOS ability<br />

requirements profiles were distinctly different. It was further<br />

determined that the particular MOS selected to perform operations<br />

and control tasks on the new system was a good choice in that the<br />

ability requirements profile for the MOS closely matched the<br />

ability requirements profile for those tasks on the new system.<br />

TAADS Data Base<br />

Up to 60% of the effort required on the part of the<br />

Personnel Proponent to prepare a MOS restructuring action is<br />

devoted to position data analysis. Position data analysis an<br />

analysis of the TAADS and Personnel Management Authorization Data<br />

(PMAD) data bases for each of the impacted MOSS. These contain<br />

detailed information on each.MOS position currently authorized<br />

(TUDS) and projected (PMAD). This is largely performed manually<br />

and, hence, very time consuming and error prone. There are<br />

criteria as to appropriate grade structure, etc., and it is<br />

essentially a "zero sum game". The TAADS and PMAD constitute the<br />

constrained manpower data base at a MOS position by position<br />

97


level with a great deal of associated information.<br />

A Position Data Analysis Job Aid-l (PDAT-JA-1) software<br />

TAADS analysis tool has been developed which will be installed at<br />

the first Personnel Proponent office (at the Signal Branch) in<br />

December. It automates manipulation of the TAADS data base and<br />

provides analysis tools. A place holder is in the program for<br />

the PMAD data base when it becomes available in the form needed<br />

for our purposes. The PDAT-JA-1 outputs are:<br />

* Quantitative summaries of MOS authorization for each<br />

grade level by: grand total, ASI, Skill Qualification Identifier _,<br />

(SQI), major command (MACOM), tables of organization and<br />

equipment (TOE), tables of distribution and allowances (TDA),<br />

continental United States (CONUS), and outside CONUS (OCONUS);<br />

* Deviations from the Average Grade Distribution matrix:<br />

Deviations from criteria regarding:<br />

(SIMO:), gender, ASIs, and SQIs;<br />

space imbalanced MOS<br />

* Development of an acceptable grade structure; and<br />

* (When PMAD program implemented) Identification of TAADS<br />

and PMAD mismatches by unit identification code and grade.<br />

The development of an acceptable grade structure is enabled<br />

through the provision of work sheets which allow the analyst to<br />

create modified TAADSs (the original data base always remains<br />

intact). Each modified TAADS can then run to produce the first<br />

three outputs above until an acceptable grade structure is<br />

realized.<br />

98


APPLICATION OF THE JOB ABILITY ASSESSMENT SYSTEM TO<br />

COMMUNICATION SYSTEM OPERATORS<br />

WILLIAM J. YORK, JR. and DOROTHY L. FINLEY<br />

U.S. ARMY RESEARCH INSTITUTE FIELD UNIT<br />

FORT GORDON, GEORGIA<br />

As the Army introduces major new equipment into its<br />

inventory,there is a need to restructure <strong>Military</strong> Occupational<br />

Specialties (MOS) and to reclassify soldiers from old MOSS into<br />

new MOSS. Identification and quantification of specific soldier<br />

abilities required to perform in new a MOS would enhance both the<br />

training development and personnel management decision process<br />

associated with major reclassification actions. A method for<br />

mapping soldier abilities requirements from old to new MOSS would<br />

provide Army managers with a useful tool in the areas of force<br />

structure design and personnel or job classification.<br />

In support of this, the AR1 Fort Gordon Field Unit is<br />

conducting research using the JAAS methodology developed by<br />

Fleishman to determine if significant differences in terms of the<br />

JAAS abilities exist among Signal MOSS and if unique ability<br />

patterns are significant enough to support mapping from old MOSS<br />

to new MOSS. Moreover, we hope to identify a group of abilities<br />

that could be measured by existing tests. This effort, to digress<br />

for a minute, supports a need to determine how best to reclassify<br />

soldiers from several existing Signal MOSS into two new MOSS.<br />

These two new MOSS support a recently introduced area<br />

communication system that is to replace the majority of the<br />

current division and Corps Signal equipment and structure.<br />

Reclassification and training of current MOSS holders to perform<br />

in the new MOSS is a critical issue. The feasibility of using the<br />

JAAS methodology to determine which new MOS is most similar to<br />

existing MOSS is the primary research goal.<br />

Our initial effort focused on existing communication MOSS.<br />

Using the JAAS abilities shown in figure 1 (pg.4) and the ability<br />

description and scale shown in figure 2, (pg.4) two groups of<br />

subject matter experts (SMES) rated four Signal operator MOSS -<br />

31C, 31M, 31L and 72E. Group A, consisting of seven senior<br />

personnel, rated all four MOSS. Group B, Consisting Of nine t0<br />

eleven MOS SMEls, rated only their MOS. Mean scores by ability<br />

and ability cluster were calculated for each MOS. Interrater<br />

reliability was determined by applying Kendall's Coefficient of<br />

Concordance to the rank-ordering of the eight ability clusters.<br />

As shown in table 1, (pg.2) rater agreement varied significantly<br />

among the four Moss, as well as, between the two groups of<br />

raters. Figures 3 and 4 (pg.5) depict the difference in profiles<br />

between the two groups. Table 2 shows examples Of actual ratings<br />

for two MOSs- the best and the worst- in terms of rater<br />

agreement.<br />

99


72E<br />

31c<br />

31L<br />

31M<br />

31M COMM<br />

RATER 1 1<br />

RATER 2 7<br />

RATER 3 1<br />

RATER 4 8<br />

RATER 5 1<br />

RATER 6 8<br />

RATER 7 5<br />

FINAL 0.090452<br />

31c COMM<br />

RATER 1 1<br />

RATER 2 1<br />

RATER 3 2<br />

RATER 4 2<br />

RATER 5 4<br />

RATER 6 3<br />

RATER 7 1<br />

RATER 8 1<br />

RATER 9 1<br />

RATER 10 3<br />

RATER 11 6<br />

FINAL 0.445887<br />

GROUP A<br />

0.324852<br />

0.333605<br />

0.242180<br />

0.090452<br />

TABLE 1<br />

TABLE 2<br />

CON<br />

3<br />

3<br />

5<br />

4<br />

3 f.<br />

7<br />

6<br />

REA<br />

2<br />

5<br />

3<br />

2<br />

4<br />

5<br />

7<br />

SPLD<br />

7<br />

8<br />

1<br />

1<br />

6<br />

4<br />

2<br />

PER-V<br />

8<br />

1<br />

7<br />

5<br />

2<br />

1<br />

3<br />

GROUP B<br />

0.343915<br />

0.445887<br />

0.124285<br />

0.286190<br />

PER-A<br />

5<br />

2<br />

6<br />

3<br />

5<br />

6<br />

1<br />

PSY GRMO<br />

4 6<br />

6 . .4<br />

4 8<br />

6 7<br />

8 7<br />

2 3<br />

4 8<br />

CON REA SPLD PER-V PER-A PSY GRMO<br />

6 4 2 5 3 7 8<br />

7 5 3 6 4 2 8<br />

4 8 7 3 5 1 6<br />

4 6 3 5 1 7 8<br />

3 7 8 5 1 2 6<br />

7 8 5 6 1 2 4<br />

6 7 5 3 2 4 8<br />

2 3 4 5 6 7 8<br />

3 5 4 7 2 6 8<br />

4 8 5 7 1 2 6<br />

4 7 3 8 1 2 5<br />

Correlation analyses between each pair of the four MOSS were<br />

conducted using the'MOS mean of each of the 50 abilities. Ratings<br />

for Group A and B had been combined for this analysis. Results<br />

are shown in table 3.<br />

TABLE 3<br />

CORRELATION MATRIX<br />

72E 31c 31L 31M<br />

72E - .7830 .0862 .5782<br />

31c - - .0824 .5527<br />

31L - .1280<br />

31M<br />

Statistical analyses of mean differences between MOSS by ability<br />

and ability cluster have not been conducted, but visual<br />

examination indicate that differences do exists both at the<br />

ability and cluster level as depicted in figures 5 and 6 (pg.5).<br />

100<br />

.


As already shown, rater reliability is poor but may be a<br />

function of the analysis approach. Additional analysis will focus<br />

on the ability level instead of the cluster level. We felt that<br />

the correlation results were highly interesting in that the<br />

degree of relationship between the MOSS pairs is highly<br />

supportive of the relationships subjectively thought to exist.<br />

Moreover, we expect to see even stronger relationships, both<br />

positive and negative, at the ability subset level. For example,<br />

a correlation analysis between each MO8 pair using the abilities<br />

groupings of communications, auditory, psychomotor, and gross<br />

motor should reveal this increase. We believe that this approach<br />

will also be applicable to a comparison analysis between current _.<br />

MOSS and new MOSS.<br />

Our future efforts will focus on several areas. First, we<br />

plan to determine rater agreement for each ability across each<br />

MOS. Using these results in conjunction with analysis of mean<br />

differences between abilities and MOSS we intend to focus on a<br />

reduced set of abilities. This subset will be a function of rater<br />

reliability and discriminate power between MOSS. In other words,<br />

those abilities that have the best rater agreement and tend to<br />

discriminate between Moss will be used. In addition to the<br />

development of a refined subset of abilities, we plan to analyzed<br />

MOSS at the major duties level. Two new MOSs- 31D and 31F- will<br />

be analyzed using the JAAS procedure and then compared with the<br />

total MOS and major duty profiles of the four MOSS already<br />

completed. These two MOSS are the operators for the new area<br />

communications system and are the MOSS into which a significant<br />

number of Signal soldiers will be reclassified. It is hoped that<br />

the profile comparisons will provide objective information for<br />

this reclassification effort.<br />

101<br />

_.


l-‘IguLC I<br />

novised Llat OK nl~llltlsn rind clestors<br />

~0llCL1’TUhlr BKILLSt 5.<br />

6.<br />

7.<br />

8.<br />

9.<br />

FEIICCI’TUAL GKILISI VlSlOlI 2 1 .<br />

2:<br />

:::<br />

. . :::<br />

rRnCErTuAL SKILIS8 hU”ITIOl, ,,.<br />

32.<br />

33.<br />

GllOS3 tlOTOn SKILI.5, II.<br />

42.<br />

43.<br />

44.<br />

45.<br />

Dlcyole 20 mllss to vork<br />

Figure 2<br />

- ‘<br />

- 4<br />

llty OC Closore<br />

(I‘JttPCII lrccogllltlo~l)<br />

selrctlvc httriltloll<br />

R atlo orialltntloll<br />

v T sunllzetloll<br />

102


0<br />

a<br />

103<br />

-t<br />

\~~.1~--l---1.1-~L-I<br />

I


Preferences for <strong>Military</strong> Assignments in German Conscripts<br />

Introduction<br />

K. Arndt<br />

Federal Office of Defense Administration<br />

Bonn, Germany<br />

What German conscripts know about available military assignments is primarily based on<br />

information from friends and acquaintances who have already done their military service. The<br />

media, the military counselor and visits to military units (open day) constitute an additional but<br />

less important source of information. It is generally true to say that knowledge and a general<br />

overview of all the military assignments that are available depend on whether information has<br />

been obtained passively or actively. Preconceived ideas about military activities do often lead<br />

to discrepancies between everyday military life and expectations. Lack of motivation, indifferent<br />

feelings about military service and discontent with the draft procedure are the consequences.<br />

In addition to the need for objective standardized information on military assignments, measures<br />

were required to counteract the negative image to the armed forces, since the willingness of<br />

young men to do military service has continually been decreasing over the past years. Against<br />

the background of a military threat, which was perceived to be real, the majority of those liable<br />

to military service passively agreed to military service, but as early as 1987 this percentage<br />

declined to less than 50 % for the first time. It must be assumed that this development has<br />

continued to date. As a result of this findings, it was decided to develop a transparent and<br />

efficient method designed to provide young men liable to military service with an overview of<br />

the requirements and qualifications for military assignments and a clear picture of job dcscription.<br />

The “Assignments - Interests - List” (AIL) is the result of this development during which variant<br />

models of information transfer and target-orientated representation were prctestcd. The results<br />

of a nation-wide AIL test are reported.<br />

Description of AIL<br />

On the basis of expert rating, the 117 possible assignments for conscripts were reduced to 2.5<br />

representative assignments covering both fighting and non-fighting troops. A brief description<br />

was compiled for each of the selected assignments, including a picture of a typical activity and<br />

an account of the most important requirements and features of the job.<br />

The Assignments-Interests-List comprises the military assignments as described in Table 1. The<br />

AIL method can be used for groups or individuals. Each item is looked at wilhout any time limits.<br />

Pretest results<br />

An initial prctcst was carried out by sampling 105 persons liable to military service. Of the 452<br />

preferences stated, 256 (57 %) fell to assignments with non-fighting troops and 196 (43 %) to<br />

fighting troops. Application of the AIL method produced a marked increase in the number of<br />

desired assignments indicated; without AIL, the average number of assignments considered to<br />

be interesting was 2.6 while use of AIL produced an increase to an avcrapc of 4.4 . The education<br />

level had no ascertainable influence on the preferences expressed. The LC~L time to work through<br />

the test ranged from 6 to 22 minutes.<br />

104


Table 1<br />

<strong>Military</strong> Assignments in the AIL<br />

Fighting Troops Non-Fighting Troops<br />

military policeman<br />

light infantryman<br />

mountain trooper<br />

paratrooper<br />

mechanized infantryman<br />

gunner<br />

missile gunner<br />

gunlayer<br />

engineer<br />

deck hand<br />

signal construction man<br />

signal operating man<br />

radio operator<br />

Sample<br />

clerk<br />

radar operator<br />

teletypist<br />

supplyman<br />

second cook<br />

driver<br />

electronics technician<br />

radiomechanic<br />

!+aircraft mechanic<br />

armament repairman<br />

automotive vehicle mechanic<br />

medical corpsman<br />

.._<br />

_____----- --- -.-. ---. ~_.______ -All<br />

1,225 persons liable to military service were tested by means of AIL during the psychological<br />

qualification and placement test before they were drafted into military service. The composition<br />

of the sample ensured regional and educational representativeness.<br />

Method<br />

The 25 items of the AIL are listed in a fixed order. The testee is asked to give his opinion on<br />

each item by indicating whether he is interested or uninterested in the described assignment.<br />

Consequently, the response is obtained by the “forced-choice” method. The testee indicates his<br />

judgement on a response form which offers two categories. The preferences are numerically<br />

represented by a preference score P which is defined by the interest-to-disinterest ratio:<br />

Preference score P<br />

=<br />

total interest<br />

total disinterest<br />

It applies P = 1,00 : interest/disinterest in the item are equally great,<br />

i.e. there is indifference<br />

P > 1.00: interest outweighs disinterest, i.e. the item is preferred<br />

P < l,OO: disinterest outweighs interest, i.e. the item is rejected.<br />

The preference scores P were placed in a rank order to show the preferences for assignments of<br />

the AIL items. Rank order comparisons between subsamples were carried out by applying<br />

nonparametric procedures (Kendall’s coefficient of concordance W). The statistical evaluations<br />

were performed on a IBM AT PC with SPSS/PC+ standard software.<br />

Acceptance of AIL<br />

Following the pretest, the testees could express their opinion on AIL by answering the following<br />

three questions:<br />

105<br />

. . . .


Question I: Do you consider such information on military assignments to be a necessary part of<br />

the qualification test?<br />

Question 2: Has AIL provided you with any details about military assignments which you did<br />

not know before?<br />

Question 3: Are you interested in further information?<br />

Table 2 shows the response frequencies for the response categories.<br />

Table 2<br />

Response frequencies of the acceptance poll (yes = +/ no = -/ partly = 0; in percent)<br />

EDUCATIONAL LEVEL<br />

Question total 1) 2) 3) 4)<br />

1)<br />

2)<br />

3)<br />

4)<br />

+ - 0 + - 0 + - 0 + -- 0 + .- 0<br />

1 86 6 8 67 22 11 85 6 9 88 3 9 92 4 4<br />

2 57 9 34 82 9 39 57 9 44 56 8 46 34 13 31<br />

3 56 26 18 55 29 I6 54 26 20 59 25 16 54 28 18<br />

- -<br />

without school-leaving certificate of secondary-level primary school<br />

.<br />

wtth school-leaving certificate of secondary-level primary school<br />

lower secondary school-leaving certificate<br />

secondary school graduation<br />

The results of the acceptance poll reveal that information on military assignments provided by<br />

AIL is considered to be necessary by the majority. The higher the education lcvcl the more<br />

widespread is this opinion.<br />

AIL offers onIy basic information on selected military assignments, but its information content<br />

is detailed enough for more than 50 % of the respondents to gain additional information they<br />

didn’t have before. In contrast to surveys which point to a monotonic rciutiortship bctwecn<br />

knowledge of military service and education level, there is no ascertainable difference in this<br />

regard.<br />

Regardless of education level, more than 50 % of all respondents wished lo obtain more<br />

information about the military assignments presented. Bearing in mind previous surveys looking<br />

into the attitudes towards military service of young men liable to military service which showed<br />

a growing rejection with increasing education levels, this result was somewhat unexpected.<br />

Despite increasingly negative attitudes towards military service, interest in obtaining information<br />

does not decline with higher education levels.<br />

In conclusion, the results of the acceptance poll show that<br />

- the respondents do not have much information on specific military assignments,<br />

- the respondents are keen to obtain additional information,<br />

- the majority of respondents welcome information on military assignments.<br />

Results<br />

The opinions expressed on the 25 AIL items in the test sample wcrc analyzed with a view to<br />

answering the following questions:<br />

(1) Which military assignments arouse greatest interest and which tend to appear uninteresting’?<br />

106<br />

. .


Preference scores<br />

overall sample (N = 1,225)<br />

Figtire 1: Preference scores for AIL items in the overall sample<br />

Preference scores of the first ten<br />

rank positions:<br />

Rank AILitems P<br />

1<br />

2<br />

3<br />

4<br />

5<br />

6<br />

7<br />

8<br />

9<br />

10<br />

driver 3.73<br />

clerk -74<br />

auto.vehic.mech. .74<br />

milpolice .70 ’ _<br />

radar operator .59<br />

armament repair .%I<br />

aircraft mech. 52<br />

gunner .46<br />

light infantery 44<br />

paratrooper 42<br />

(2) Are opinions about military assignments influenced by educational level or regional factors?<br />

(3) Do assignments with fighting and non-fighting troops meet with different degrees of interest?<br />

The following figure shows the preference scores P for the 25 AIL items based on the two<br />

response categories (interesting/not interesting). It clearly highlights the fact that only the<br />

assignment as driver achieves a P-scoreH.0. Since military driving licenses continue to be valid<br />

in civilian live upon completion of military service, assignments to a driver’s job is of great<br />

benefit to military conscripts. The other AIL items received much lower preference scores (all<br />

less 1,0) . The scores are shown in a ranked order.<br />

The subsamples based on school education and regional background produced preferences<br />

completely different from those obtained for the overall sample. As highlighted in table 3 the<br />

number of preferred military assignments (P > 1,0) increases as the level of school education<br />

rises.<br />

Table 3<br />

Preference scores P > 1,0 for subsamples based on school education<br />

and regional background<br />

-a-----<br />

Northern<br />

___________-___---------<br />

R e g i o n s<br />

Central Southern Sum<br />

level 1) 1 5 2 8<br />

Ieve! 2) 1 5 2 8<br />

level 3) I 6 3 10<br />

level 4) 4 5 7 15<br />

Total 7 21 14 41<br />

1) .<br />

2)<br />

wIthout school-leaving certificate of secondary-level primary school<br />

with school-leaving certificate of secondary-level primary school<br />

3)<br />

lower secondary school-leaving certificate<br />

4)<br />

secondary school graduation<br />

107


-_________ _.._.._.^.. ..--._-e___--.. --- --- -- - .-~I. ---.<br />

A statistical analysis of the rank ordered P-scores using KENDALL’s coefficient of concordance<br />

W did not produce any significant differences between the school education samples within the<br />

specific regions.<br />

The results presented in Table 3 show that although the AIL items were appraised in the regions<br />

with varying degrees of intensity, the ranking position of the preference scores largely coincide.<br />

To compare the preferences for assignments with fighting as opposed to non-fighting forces the<br />

mean ranking position of the individuals assignments were taken. In both, the overall sample<br />

and the subsamples based on region assignments with non-fighting forces achieved in each case<br />

a significant better mean ranking position than those with fighting forces.<br />

Table 4<br />

Mean ranking positions for assignments with fighting<br />

and non-fighting forces (* = p c .OS; ** = p c .Ol).<br />

-I__------v---- - - - - m - m - - -------<br />

Mean ranking position Difference<br />

in ranking<br />

Region Fighting forces Non-fighting forces position<br />

Northern<br />

Central<br />

Southern<br />

Total<br />

16.4 9.1 7.3 **<br />

15.0 10.4 4.6 “1)<br />

15.8 9.6 6.2 **<br />

14.9 10.7 4.2 ‘1)<br />

------<br />

Preference for assignments with non-fighting forces was found to be broadly the same throughout<br />

the overall sample and the regional subsamples. In the subsamplcs based on school education,<br />

however, there was no such uniform appraisal (Tab. 5).<br />

Table 5<br />

Mean ranking positions for assignments with fighting/non-fighting forces<br />

in subsamples based on school education *)<br />

-____----___- ____ ______--I_-__-- --.- ----<br />

Mean ranking position Diffcrencc<br />

in ranking<br />

School Education Fighting forces Non-lighting forces position<br />

1) 12.8 12.8 0.0<br />

2) 13.5 12.2 0.7<br />

3) 16.2 9.0 7.2 **)<br />

4) 15.9 9.8 6.1 **)<br />

Total 14.9 10.7 4.2 *><br />

1) School education ( 1 to 4 ) see Table 4.<br />

It would appear that the preference for assignments with non-fighting forces increases, the higher<br />

the level of school education.<br />

Croups with a lower education level showed no or little diffcrencc in their preference for fighting<br />

or non-fighting forces while those with a higher level of education clearly preferred assignments<br />

with non-fighting forces.<br />

The following cross-tabulation with the determinants “region” and “school education” highlights<br />

the differences between assignments with non-fighting and fighting forces. Conscripts with a<br />

.


Region<br />

School’)North Center South<br />

1) School education level 4<br />

(1 to 4 ) see Table 4<br />

5<br />

Differences of mean rankings<br />

NO= - C&AL SOiJTH<br />

Table 6 / Figure 2<br />

Differences in the mean ranking position for assignments with fighting and non-fighting<br />

forces according to school education and region (significant differences underlined).<br />

lower education level show no or non-significant differences for assignments with fighting or<br />

non-fighting forces. In contrast, these differences are clearly pronounced and highly significant<br />

in the cases of conscripts with a higher level of education. With the exception of those with a<br />

school-leaving certificate of a secondary-level primary school in the southern region and those<br />

with a lower secondary school-leaving certificate in the northern region, regional impacts on the<br />

preferences are negligible. When compared to corresponding samples in other regions, these two<br />

samples exhibit significantly high differences in the mean ranking positions.<br />

The results presented here concerning military assignment preferences in samples with different<br />

educational and regional backgrounds are based on the mean ranking positions for the various<br />

assignments with fighting and non-fighting forces. An analysis of the appraisals of the individual<br />

assignments produces quiet divergent results. For example, in all regions those with the highest<br />

school-leaving certificate most strongly prefer the assignment as “gunlayer” with the fighting<br />

forces, but it is only in the central region that there is a clear preference for the assignment as<br />

“paratrooper” .<br />

Conclusions<br />

AIL is an effective and objective way of providing information and ascertaining the assignment<br />

preferences of those liable to military service. In addition to individual assignment prcfcrcnccs,<br />

which are important for placement, it is possible to find out about the main preferences of those<br />

liable to military service and the way they are affected by their regional and educational<br />

background. On this basis, indications of information actions can be taken in the pre-draft phase<br />

(e.g. in.recruitment campaigns). Changes in preference scores will rcvcal whether such actions<br />

are effective.<br />

The AIL procedure is beneficial both to the Federal Armed Forces as an organization and those<br />

liable to military service. Aptitude diagnosis is thus understood IO be a cooperative process<br />

between equal partners which gives the prospective conscript adequate guidance and allows<br />

room for initiative and active participation. The “classical” diagnostic criteria of objectivity,<br />

reliability and validity are supplemented by fcaturcs such as fairms* s, transparency, acceptance,<br />

counselling and innocuousness. Aptitude diagnosis in this form seeks to benefit both sides<br />

(testing organisation and individual candidate) equally.<br />

109


Aptitude-Oriented Replacement of Conscript Manpower<br />

in the German Bundeswehr<br />

Retrospective View<br />

S. B. Schambach<br />

Federal Office of Defense Administration, Bonn, Germany<br />

In 1990, the Psychological Service of the Geman Federal Armed Forces (GFAF) is celebrating<br />

a very special jubilee: The Aptitude and Placement Examinations (EVP) for Draftees at the<br />

Subregion Recruiting Offices have been carried out for 25 years.<br />

Before the EVP was introduced, manpower requirements of draftees had heen met solely on the<br />

basis of medical fitness, the final assignment of a position being controlled by a lot system. At<br />

his muster, each draftee obtained a rank number chosen at random. The slots for replacement<br />

were assembled in a list and also given rank numbers. The draftees were called up by the order<br />

of the list until the slots were filled. For numerous assignments, though, only men were called<br />

up who had a specified civilian occupational training.<br />

The effect of the lot system was that many men of restricted medical fitness were assigned jobs<br />

which they were hardly apt for, while well qualified men failed to be called up for service. In<br />

contrast to this, the increasing standard of technical equipment of the forces required higher<br />

ability of the assigned manpower. In public, criticism of the call-up “lottery game” was growing.<br />

It was for these reasons that in 1965 the Aptitude and Placement Examination was instituted to<br />

be taken by each draftee who was found medically lit at his muster. For the examination method,<br />

the US-American “Army Alpha Test Battery” formed the model, a modified version of which<br />

had already been successfully applied to army volunteers.<br />

The EVP comprises a sophisticated biographical questionnaire mainly referring to interests,<br />

skills and general performance factors, but its core is a test battery covering the areas of general<br />

ability and educational level, perception and reaction, mechanical and electrical engineering<br />

comprehension, as well as some further faculties related to specialist functions. In defined cases,<br />

a group situation test or an interview with the psychologist, or more test procedures can be<br />

applied.<br />

The same test battery is applied to army and air force volunteer applicants (except officer<br />

applicants) in order to facilitate psychological diagnostics for those applicants who are liable to<br />

compulsory service as well. The test battery for volunteers includes additional test procedures.<br />

Table 1: Aptitude and Placement Examination for GFAF Draftees (EVP)<br />

Biographical analysis, with special regard to performance factors, interests and activities, school<br />

and occupational training<br />

Test battery: General intelligence and educational level, technical comprehension (mechanical<br />

and electric engineering), perception-reaction capacity; under defined circumstances also:<br />

Perception tests, group situation test, interview, etc.<br />

Evaluation of behavior characteristics and expression in writing<br />

110


Aptitude-Oriented Manpower Replacement on the Basis of EVP Assignment<br />

Proposals - Some Remarks on the GFAF Recruiting System<br />

The EVP psychologist, on the basis of his diagnostic findings, works out for each draftee<br />

proposals forhis aptitude-oriented placement in military service. The psychologist’s assignment<br />

options are mere recommendations to the recruiting agencies since as yet a draftee has no legal<br />

claim to be trained according to his EVP aptitude assessment. The administration officials in<br />

charge of personnel replacement are instructed to give priority to the EVP results.<br />

Each recruiting official has to record by data input into the central computer, the degree to which<br />

he has taken aptitude objectives into account in every single replacement decision, i.e. regarding<br />

every single draftee who was given an assignment. The following levels of quality in personnel<br />

replacement are discerned:<br />

1 - Aptitude-oriented replacemeht<br />

2 - Job-related replacement<br />

3 - Occupation-related replacement<br />

4 - “Quantitative” replacement (regardless of aptitude).<br />

In this list, consideration of aptitude criteria is decreasing from step to step<br />

1) Aptitude-Oriented Personnel Replacement<br />

Each military occupation on entrance level is characterized by a job title and a corresponding<br />

specialty number stating the military setvice (army, air force, navy) and the type of job. Groups<br />

of similar jobs are combined and labeled by an alphanumeric “assignment symbol”. These<br />

symbols (over a hundred) were specially designed to facilitate personnel replacement. Most of<br />

the symbols comprise several specialties of equal medical and psychological job requirements.<br />

The assignment symbols and their respective job titles and specialty numbers are listed in the<br />

so-called Personnel Requisition Table where the symbols are again grouped with respect to<br />

different fields of service, e.g. artillery functions, aircraft repair, medical duties. The Personnel<br />

Requisition Table also contains additional requirements and hints for placement, linked to<br />

assignment symbols, as e.g. a certain civilian occupational training which is a prerequisite of an<br />

assignment, or if high school graduates am wanted for these jobs.<br />

The troops announce their manpower requirements by giving the assignment symbols. At<br />

present, the core requirements for each of the four annual call-ups are announced half a year<br />

before. Only shortly before each call-up term can the complete personnel requisition be set up<br />

which includes personnel fluctuations by drop-outs, organizational changes, changes in the<br />

degree of medical fitness, enlistment as volunteer, etc.<br />

The recruiting organization can, after their preparatory activities during the course of conscrip<br />

tion (registration of men liable to service, muster, EVP), dispose of accumulated and computerrepresented<br />

data on every single man due for conscription. Even before psychological assignment<br />

proposals are present, the computer will automatically pick out a provisional assignment<br />

symbol corresponding to a man’s civilian occupational training (if he has any). The psychological<br />

assignment proposals are also recorded in the central computer. They are as well given in<br />

terns of assignment symbols, relating to the above-mentioned Personnel Kcyuisi tion Table. At<br />

present, the psychologist may propose up to 9 different assignments for a draftee.<br />

Following a computer-aided optimizing model, the manpower requirements of the troops - in<br />

terms ofassignment symbols- are shared out between the subregional recruiting offices. Those<br />

111


ecruiting ofticcs of a military district which will, according to their stock of assignment<br />

symbols, best be able to meet the rcquircd symbols, arc allotted the requisitions.<br />

The computer will also support the recruiting official in placing a draftee on a spccificd slot.<br />

The machine will automatically provide a placement proposal by fitting a requisition symbol at<br />

issue to one of a man’s given psychologically-based assignment symbols. Yet a great number<br />

of placements are still carried out manually because persons with special charactersitics or<br />

certain personal and social circumstances (unemployed, married, medical doctors, etc.) have to<br />

be considered with priority.<br />

2) Job-Related Personnel Replacement<br />

Sometimes a requisition of a certain job and respective symbol is at issue for which a man due<br />

for that call-up term, and bearing a corresponding assignment symbol, cannot be found.<br />

Conversely, there may be a draftee who from social or occupational etc. reasons shall be called<br />

up for a certain term when there happen to be no requisitions for the symbols proposed for him.<br />

To help the recruiting official in cases like these, the Psychological Service has supplied special<br />

lists which indicate whether a given symbol may be substituted by another one because ofsimilar<br />

aptitude factors, or whether a symbol for a highly qualified job may be replaced by another one<br />

calling for less qualification. For instance, a draftee whose aptitude as a paratroop er has been<br />

stated, will likewise be apt as a guard and security soldier, even if this assignment symbol should<br />

not be proposed for him.<br />

3) and 4): Occupation-Oriented and Quantitative Replacement<br />

In these cases, a position is filled regardless of the psychologist’s aptitude-oriented assignment<br />

proposals, or, as an exception, there are no such proposals at all. Under condition 3), the<br />

assignment will at least follow the man’s civilian occupational training. In most of these cases,<br />

draftees are concerned in whom there prevail exceptional life situations. In such placement<br />

decisions, the psychologist in charge shall collaborate as a consultor.<br />

In about 95 % of the call-ups, the psychologist’s assignment proposals have been taken into<br />

account by the recruiting offtcials, as is shown by the following table which reflects a long-term<br />

state of affairs:<br />

Table 2: Quality of Personnel Replacement with Regard to Psychological<br />

Aptitude Criteria - May 1990 -<br />

AMY<br />

Air Force<br />

Navy<br />

Total<br />

Percentage of Placements<br />

- -<br />

Aptitude-oriented Job-related Occupation-related Quantitative<br />

_-----<br />

95 4 0 0<br />

95 4 1 1<br />

92 6 0 1<br />

95 4 0 I<br />

.-_. ____


The Psychologist’s Method of Proposing Assignment Symbols<br />

In formulating his assignment proposals, the psychologist goes by the system of symbols laid<br />

down in the Personnel Requisition Table. The psychological aptitude prerequisites for the<br />

military jobs are compiled in the so-called Symbol Assignment Ttible. This table was issued by<br />

the Psychological Service and is structured in the same way as the Personnel Requisition Table,<br />

giving the assignment symbols instead of specialty numbers. In this table, psychological aptitude<br />

profiles am set up for each assignment symbol according to the method of multiple cut-off scores.<br />

Different kinds of prerequisites are attached to each symbol which have to be observed by the<br />

psychologist:<br />

Table 3 : Psychologically-Based Assignment Proposals in Accord with:<br />

5.<br />

- medical requirements<br />

- basic intelligence level<br />

- cut-off scores in the relevant subtests;<br />

- (for certain symbols:) additional indispensable or desirable aptitude prerequisites such as<br />

knowledge of English language, driver’s license, etc.<br />

- administrative remarks<br />

- specified civilian occupational training, if indicated in the Personnel Requisition Table<br />

The psychologist will compare the total pattern of his diagnostic findings to the job characteristics<br />

of the assignment symbols, especially to their concretized medical, occupational, test<br />

and other implications, and pick out the ones corresponding to the draftee’s aptitudes. Assignment<br />

symbols ruled out medically are absolutely excluded. With respect to the other aptitude<br />

prerequisites, the psychologist is normally given a high judgment factor:<br />

a) He may go below the prescribed cut-off scores (regarding intelligence level and subtest<br />

results) if the difference is within the frame of the confidential limits given by the test reliability.<br />

b) Major deviations fmm the test profiles are permitted in individual cases if psychodiagnostitally<br />

founded. The same is valid for deviations from occupational and other prerequisites.<br />

The psychologist then ranks the assignment symbols he is going to propose. The priorities are<br />

subject to his judgment. He will take into account for which symbol the aptitude is best, and<br />

which are the draftee’s personal interests and preferences. If a draftee is suited for a so-called<br />

.deficit symbol for which manpower replacement is difficult, this is regularly given priority. A<br />

list of these symbols is available for the psychologist.<br />

Methodical Deficiencies in Aptitude Assessment<br />

The test profiles and cut-off scores established for the assignment symbols were set up according<br />

to expert ratings. They were not based on detailed job analyses. Systematic research on the<br />

113<br />

_


validity of the EVP test methods and the performance of draftees in the jobs assigned them, am<br />

missing for most assignment symbols. A formal comparison via central computer data between<br />

the assignment for which a man was called up, and the specialty he was awarded after basic<br />

training, gave only 75 % congruence. For this low rate, deficiencies of psychological prognostic<br />

methods am only partly the cause. Many conscripts have to be moved during the course of their<br />

basic training from various organizational and medical reasons, so that they will complete their<br />

service in a job other than the one assigned by the recruiting agencies. Nevertheless, the aim of<br />

the Psychological Service is to increase the percentage of correspondence by methodological<br />

improvements.<br />

Table 4: Aptitude Characteristics Relevant in <strong>Military</strong> Jobs<br />

Perception and Reaction Reasoning<br />

Signal Shape Discernment Verbal Skills<br />

Spatial Imagination Memory<br />

Achievement Motivation Mechanical /Technical Comprehension<br />

Reliability Electric Engineering Comprehension<br />

Concentration/Stress Tolerance Psychomotor Coordination<br />

Arithmetical Comprehension Social competency<br />

As a first step, in more than 400 military jobs listed in the Personnel Requisition Table aptitude<br />

characteristics were identified by expert rating in cooperation with the military services. For the<br />

14 characteristics found, see Table 4. An attempt to operationalize these characteristics by<br />

psycho-diagnostic constructs and corresponding test methods which might allow for aptitude<br />

assessment, showed that only part of these constructs are covered by traditional EVP test<br />

methods. Important characteristics, such as<br />

- spatial imagination<br />

-memory<br />

- psycho-motor coordination<br />

- stress tolerance<br />

do not seem to be represented in our test procedures.<br />

Thorough validation studies therefore seem indispensable. At present, 36 psychologists of the<br />

recruiting organization are investigating into some 30 military jobs (listed under 22 assignment<br />

symbols). The studies include:<br />

- detailed job description and analysis of aptitude demands<br />

-identification and operationalization of probation criteria such as award of specialty, successful<br />

completion of courses, assessment of superior, as well as personal criteria such as job<br />

satisfaction, or interest in later enlistment as volunteer<br />

- studies on probation and validation of the traditional EVP examination methods (test procedures,<br />

biographical data, etc.)<br />

- implementation and examination of other test methods, and development of new test methods<br />

if necessary; probation study on these methods.<br />

114<br />

.


Table 5: Study on Job Characteristics: Radio Relay Soldier (Scale: 1 [best] to 7)<br />

Test Methods Traditional<br />

Test Profile<br />

General Intelligence Index<br />

Figure Reasoning Test<br />

Word Analogy Test<br />

Arithmetical Comprehension Test<br />

Orthography Test<br />

Mechanical Comprehension Test<br />

Electric Engineering Comprehension Test<br />

Reaction-PerceptionTest<br />

Signal Discernment Test<br />

Memory Test<br />

Spatial Imagination Test<br />

Concentration Test<br />

3.8<br />

4 3<br />

3 5<br />

5<br />

3 4<br />

3 3<br />

5<br />

5<br />

4<br />

4<br />

4<br />

Proposed Test Profile<br />

(Operationalized<br />

Job Characteristics)<br />

Most of the researchers have presented sophisticated job analyses, and identified probation<br />

criteria. Results of job analyses show that several job titles which are listed under the same<br />

assignment symbol, in the Personnel Requisition Table, differ in their aptitude characteristics<br />

to a degree that separation is being suggested. For numerous jobs, psycho-diagnostic constructs<br />

were found for which our EVP methods do not provide sufficient information (see Table 5 for<br />

the radio relay soldier), They will probably be supplemented by test procedures which will allow<br />

for prognosis of concentration and stress tolerance, memory, and spatial imagination.<br />

Summary<br />

A random-based system of conscript manpower replacement in the German Bundeswehr proved<br />

unable to ensure the sufficient qualification of recruits in their military jobs. Since 1965,<br />

conscripts perform a psychological Aptitude and Placement Examination (EVP) before they are<br />

called up for service. Roughly 75 % of conscripts complete their training regularly by king<br />

awarded the specialty corresponding to their assignment. The aim of the Psychological Service<br />

is to increase this percentage by detecting in conscripts abilities yet unexploited, and making<br />

use of them in personnel replacement. This implies improvements in the methodology of<br />

aptitude diagnosis, especially also the application of new types of tests. By means of psychological<br />

job analysis, work characteristics which have not been covered by EVP diagnostics, are<br />

to be identified, and appropriate examination methods are to be developed. Additionally,<br />

einpirical studies are to be carried out to investigate into the validity of our present examination<br />

methods with regard to military job demands. As a first step, aptitude characteristics of the<br />

military jobs taking part in the quarterly replacement, were categorized and operationalized by<br />

psycho-diagnostic constructs which might allow for aptitude assessment. Inspection of these<br />

constructs shows that part of them are covered by traditional EVP test methods while some<br />

important characteristics do not seem to be methodically represented in our Entrance Examination.<br />

At the moment, validation studies are being carried out on 28 different military jobs for<br />

which requirements are urgent and in which one single aptitude characteristic is prominent.<br />

Investigation designs and some first results are available.<br />

115<br />

- -._.---I__._ 4 ---I


_ ____. - __.-.. ~.._ . ..-__.--<br />

DEVELOPING A TRAINING TIME Sr PROFICIENCY MODEL FOR ESTIMATING<br />

AIR FORCE SPECIALTY TRAINING REQUIREhlENTS OF NEW WEAPON SYSTEMS<br />

David S. Vaughan Winston R. Bennett<br />

Jimmy L. Mitchell SC J. R. Knight David V. Buckenmyer<br />

McDonnell Douglas Missile Training Systems Division<br />

Systems Company Air Force Human Resources Laboratory<br />

Abstract<br />

Estimating traiqing costs and training capacity constraints are among the major manpower,<br />

personnel, and training issues in the development of new weapon systems. Use of the recently<br />

developed Training Decisions Modeling Technology in the systems acquisition process is<br />

problematic since no occupational survey data will be available as a basis for modeling the specialty,<br />

its jobs, and its training. This paper reports an innovative experimental approach using subject<br />

matter experts’ ratings of generic skill and knowledge categories for the anticipated work to predict<br />

training time and proficiencies (training setting-specific learning curves). Regression aaalysis<br />

indicates that substantial proportions of the variance in training time curves can be predicted from<br />

such ratings. This approach may improve training decision making and logistic support x~~lyscs<br />

early in the new weapon system acquisition process.<br />

Bacl


equired for an occupation (Ruck, 1982). It includes procedures for developing data bases<br />

and modeling the dynamic flow of people through jobs and through both formal training and<br />

on-the-job training. Furthermore, the system includes modeling and optimization capabilities<br />

which provide estimates of training quantities, costs and capacities for both formal training<br />

and training on-the-job training (Vaughan, et al., 1989).<br />

Problem - MPT Decisions in the New Weapon Systems Acquisition Process<br />

In the New Weapon Systems Acquisition Process (NWSAP), the assessment of changes<br />

required in manpower, personnel, and training programs are difficult (Gentner, 1988) - the<br />

problem is particularly acute for the largely-hidden on-the-job training (OJT) costs and OJT I<br />

capacity of units which will receive the new system. The TDS, with its capability to estimate .<br />

such costs and capacities, may be of considerable value in helping evaluate MPT costs and<br />

capacities in NWSAP studies, if TDS procedures can be adapted to predict needed task<br />

charactersitics and to model expected impacts on job and training patterns.<br />

TDS Training-Time Models<br />

Training-time models are important components of the TDS data base. These models<br />

may be thought of as learning curves; they translate training time on a group of tasks (task<br />

module) into the proficiencv, relative to full proficiency, obtained from such training. Figure<br />

1 illustrates a set of training-time models for an aircraft maintenance task. Note that<br />

separate learning curves were developed for several major training settings or training<br />

delivery methods, including classroom, correspondence course, guided hands-on, and OJT.<br />

These training-time models permit different training delivery methods to be traded off to<br />

find the best way, or combination of ways, to deliver training for a particular task. These<br />

training-time models play a critical role in the TDS model. In particular, they are the basis<br />

for estimating OJT training quantities.<br />

1<br />

Profiency (Z:)<br />

00 ,<br />

6 0 -<br />

0 2 4 6 a 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40<br />

Training Hours<br />

Figure 1. Training Allocation Curve for Aircraft Environmental Systems Task Module 34.<br />

117<br />

-<br />

i


___-.-. _- _-_.__^I__. _.. - -...<br />

--~_<br />

In the TDS R&D work, training time models were developed from SlMEs’ judgments<br />

concerning training times in various training settings required to reach full proficiency. This<br />

approach proved satisfactory for ongoing MPT planning applications (Vaughan, et al, 1989).<br />

However, it poses several problems for the weapon-system-design application. First, it<br />

requires SMEs who are familiar with training on the subject tasks. For a new weapon<br />

system, there are no SMEs with “hands on” experience with training. This is a common<br />

problem in Logistics Support Analysis (LSA); the usual solution is to find extsting systems<br />

that are comparable to a new system. Data and SMEs are used for the existing comparable<br />

systems. In general, this approach involving comparable existing systems could be used to<br />

estimate training time models for the new weapon system tasks. However, because the new<br />

system often makes use of technology not incorporated in any comparable existing system,<br />

some of the new tasks have no counterparts on existing systems. Thus,. the comparable<br />

e.xisting system approach is not entirely satisfactory for our use.<br />

Experimental Approach<br />

In the weapon-system-design application, the TDS should be sensitive to design changes<br />

and should provide feedback to designers concerning which features or aspects of their<br />

designs are the primary drivers of training requirements. The training-time-modeling method<br />

does not identify which task features or characteristics determine a task’s training time model<br />

and cannot provide feedback to designers concerning how change a design in order to reduce<br />

training requirements. That method would rely entirely on SMEs’ judgments based on global<br />

task experience to obtain the training time models. As a consequence, the method is not<br />

likely to be very sensitive to impacts of design changes on task training times.<br />

Equation 1 is the model that we used on the TDS R&D to estimate the training time<br />

curves such as illustrated in Figure 1:<br />

p = ac,hc, + acIzhcl* + a,h,, + acorLhmrZ + aSohho + ahoZhho2 + ao,~holl<br />

+ ao,12ho,12<br />

[Equation l]<br />

where p = relative proficiency, a:s = regression weights, hi’s = training hours in various<br />

training settings, and subscripts for training settings are defined as:<br />

cl = classroom,<br />

car = correspondence course,<br />

ho = guided hands on (Air Force field training detachment courses, etc), and<br />

ojt = on-the-job training.<br />

The model of equation 1 has several features. First, it has no additive constant; zero<br />

training hours produces zero proficiency. Second, each curve of Figure 1 corresponds to the<br />

second-order polynomial equation section of equation 1 associated with a particular training<br />

setting. Third, the polynomial equation segment associated with each training setting is<br />

neg:tivelqr accelerated. All eight model parameters associated with a particular task were<br />

estimated simultaneously in a single regression analysis.<br />

118


For the weapon system design TDS application, our objective is to replace the separate<br />

training time equations for each task module with a single equation that can be applied to<br />

any task. In the desired equation, task modules are described by scores on scales which<br />

reflect various skill and knowledge requirements. The first step in developing such a training<br />

time model is to generalize the task-specific training time model of equation 1 to cover many<br />

tasks. This can be done by introducing dummy-coded task identification variables:<br />

P = sum(i= 1,t) [ a,,h,,xi + acuhclki + a,&,> + amdhmrki +<br />

where xi dummy-coded task identification variable for task i,<br />

i= l...r, and<br />

xi = 1 if the current observation is for task i, 0 otherwise.<br />

[Equation 21<br />

Equation 2 may be thought of a model whose variables are interactions of tasks and<br />

training hours. This model contains 8 times t (number of task modules) interaction predictor<br />

variables. Consider a model in which tasks (e.g., the task indicator variables x are replaced<br />

with task descriptions in the form of scores for tasks on skill and knowledge scales:<br />

P = sumtj = l,r) [ aclh,Jj + acuhZYj + amrhmryj + amrlhmr% +<br />

where yj = score for current task on rating scale j, j = l...r.<br />

[Equation 31<br />

Equation 3 may be thought of as a special case of equation 2, in which the task by<br />

training hour interaction is restricted to that portion attributable to the task rating scale<br />

scores. If the scores measure the task features that drive their training time models, then<br />

equation 3 will account for most of the proficiency variation that equation 2 can account for.<br />

The next step in building our training time model was to identify a set of standardized<br />

skill and knowledge scales. For this purpose, we adopted a set of 26 skill and knowledge<br />

dimensions that was developed by occupational analysts at the USAF Occupational<br />

Measurement Center for classifying tasks in various occupations (Bell & Thomasson, 1984).<br />

More recently, these task dimensions have been revised by researchers for use in assessing<br />

skill transferability between occupations (Lance, Kavanagh, & Gould, 1989).<br />

We obtained ratings on each of the 26 scales (see Figure 2) for all task modules in the<br />

Aircraft Environmental Systems Maintenance (Air Force Specialty 423Xl),occupation. This<br />

occupation contains 57 task modules, each composed of one or more occupational survey<br />

tasks (Perrin, Knight, Mitchell, Vaughan, & Yadrick, 1988). The ratings were obtained<br />

specifically for this R&D work from five Air Force Non-commissioned Officers (NCOs) who<br />

were experienced in the Aircraft Environmental Systems Maintenance occupation.<br />

Agreement among the five raters was measured for each scale by the intraclass correiation<br />

or omega-squared (Hayes & Winkler, 1971). This intraclass correlation is equivalent to the<br />

R,, measure often used to evaluate occupational.survey task factor ratings. For raw ratings,<br />

119


intraclass correlations for many scales were zero<br />

or negative. A standardization transformation to<br />

remove scale use differences among raters was<br />

needed. For these scales, a zero rating has<br />

absolute meaning--that a task requires no skills<br />

or knowledges related to a particular scale. A<br />

standardization transformation should not change<br />

zero ratings. For this reason, the following<br />

standardization was applied to the ratings:<br />

Y ,,k = X,,, / [sum(k= 1,57) Xijk ] [Equation 41<br />

where yi,k = standardized rating for rater i on<br />

scale j and task k, and<br />

X Ilk = raw rating for rater i on scale j and<br />

task k.<br />

Figure 2 presents interrater agreement<br />

statistics for the 26 rating scales after this<br />

standardization. The ratings have acceptable<br />

interrater agreement. Three of the scales,<br />

Medical-Patient Care, Medical-Equipment<br />

Oriented, and Medical Procedures had no<br />

non-zero ratings for tasks in this occupation.<br />

Thus, meaningful intraclass correlation statistics<br />

could not be computed for these scales although<br />

all raters agreed on all ratings for these scales<br />

(zero).<br />

For modeling purposes, we augmented the<br />

training-time data file from the TDS R&D with<br />

scores on the 26 skill and knowledge scales. We<br />

used mean standardized ratings across raters for<br />

each task and scale. If the 26 scales are a useful<br />

basis for estimating training-time models, then<br />

equation 3, which uses scores on the scales along<br />

with training times to predict proficiency, should<br />

account for most of the proficiency variation<br />

accounted for by equation 2, which includes<br />

actual task identities.<br />

Results<br />

Scale ( Omega21 Rkk<br />

1. Clencal .28 .66<br />

2. Computatronal .13 .44<br />

3. Office Equipment Operatton 34 .72<br />

4. Mechanical .13 A3<br />

5. Simple Mechanrcaf .06 .23<br />

EquipmentlSystems<br />

Operatron<br />

6. Complex Mecnanicaf .Ol .06<br />

Equipment/Systems<br />

Operation<br />

7. Mecnanical-Electrical .15 .47<br />

8. Mechanfcai-Electronrc .20 .56 -.<br />

9. Elecnrcal .ll .38<br />

10. Eiewomc .20 .56<br />

11. E!ectncaf-MechanrcaJ .22 .58<br />

12. Eiectncaf-Eiectronc .13 A3<br />

13. Eiectronic-Mechantcal .19 .54<br />

14. Simple PhysrcaJ Labor .oo .oo<br />

15. Medical-Pattent Care . .<br />

16. Medical-Equipment . .<br />

Or-tented<br />

17. Medical Prccedures . .<br />

18. Simple Nontechnical .02 .I3<br />

Procedures<br />

19. Communicative-Oral .20 .55<br />

20. Communicatrve-Written .*9 .a3<br />

21. General Tasks Or .13 .e<br />

Proceaures<br />

22. Reasoning/Planning/ .05 .2?<br />

Analyzing<br />

23. Scienttfic Math Reasoning .08 .31<br />

Or Calculatrons<br />

24. Speaal Talents .05 ‘9<br />

25. Supervisory .27 24<br />

26. Training .05 .20<br />

Note:<br />

Omega Squared Is The Intrac!ass Correlation fnter-<br />

Rater Agreement: It Is Equivalent To The Fit 1. Pkk<br />

Is The Estimated Reliability For The Mean Rating From<br />

Five Raters.<br />

‘All Tasks Had Zero Ratings; Inter-Rater Agreement<br />

Statistics Are Meaningless.<br />

Figure 2. Interrater Agreement Data<br />

for the 26 Skill and Knowledge Scales.<br />

Our first modeling activity was to fit the regression model of equation 2. The R* for this<br />

model was .65, which is statistically significantly greater than zero: F(451,2255) = 9.5, p < .OOl.<br />

Next, we fit the regression model of equation 3, which replaced task identification variables<br />

with scores on the skill and knowledge scales. The R* for this model was .52, which is also<br />

statistically significant: F( l&$,2525) = 14.9, p < .OOl. If one views the skill and<br />

120


knowledge-based model (equation 3) as a restricted version of the full task identity model<br />

(equation a), the R’ increase associated with the full model may be tested. That test shows<br />

that the R’ difference between these models, .13, is statistically significant: F(267,2258) =<br />

3.1, p c .Ol. However, the skill and knowledge scales model accounts for 80% of the variance<br />

accounted for by the full-task model. Thus, the skill and knowledge model has great<br />

practical value for estimating TDS training time models.<br />

Discussion<br />

The skill-and-knowledge scale model is much more accurate than we expected. Also, it<br />

permits training-time models--learning curves--to be estimated for tasks early in the design<br />

process, and it provides feedback to designers concerning which particular task skill and<br />

knowledge requirements are causing high training times. For these reasons, we believe that<br />

the skill-and-knowledge trainingtime model represents a significant step forward in our<br />

ability to design systems with acceptable training requirements, and to incorporate MPT<br />

considerations into the weapons system design process.<br />

REFERENCES<br />

Bell, J. & Thomasson, M. (1984). Job Cntecorization Proiect. Randolph AFB, TX: United States Air Force<br />

Occupational Measurement Center.<br />

Christal, R.E., s( Weissmuller, J.J. (1988). Job-task inventory analysis. In S. Gael (Ed), Job Analvsis Handbook<br />

for Business. Industrv. and Government. New York: John Wiley and Sons, Inc. (Chapter 9.3).<br />

Gentner, F.C. (1988, December). USAF Model Manpower, Personnel and Training Organization--An Update.<br />

Proceedincs of the 30th Annual Conference of the Militarv Testinu <strong>Association</strong>. Arlington, VA: U. S. Army<br />

Research Institute.<br />

Hayes, W.L. Sr Winkler, R.L. (1971). Statistics: Prohahilitv. Tnference and Decision. New York: Holt, Rinehart<br />

s( Winston.<br />

Lance, C.E., Kavanagh, MJ., Sr Gould, R.B. (1989, August). Development and Convergent Validity of Cross-Job<br />

Movement Indices. Paper presented at the annual meeting of the American Psychological <strong>Association</strong>, New<br />

Orleans, LA.<br />

Mitchell, J.L.; Ruck, H.W.; & Driskill, W.E. (1988). Task-based training program development. In S. Gael (Ed),<br />

Job Annlvsis Handbook for Business. Tndustrv. and Government. New York: John Wiley Sr Sons, Inc.<br />

Perrin, B.M., Knight, J.R., Mitchell, J.L., Vaughan, D.S., & Yadrick, R.M. (1988). Trainine Decisions Svstem:<br />

Develonment of the Task Characteristics Subsvstem (AFHRL-TR-88-15). Brooks AFB, TX: Training Systems<br />

Division, Air Force Human Resources Laboratory.<br />

Ruck, H.W. (1982, February). Research and development of a training decisions system. Proceedinas of the<br />

Societv for Applied Learnine Technoloa. Orlando, FL.<br />

Ruck, H.W., SC Birdlebough, M.W. (1977). An innovation in identifying Air Force quantitative training<br />

requirements. Proceedines of the 19th Annual Conference of the Militarv Testine <strong>Association</strong>. San Antonio, TX:<br />

Air Force Human Resources Laboratory and the USAF Occupational Measurement Center.<br />

Vaughan, D.S.; Mitchell, J.L.; Yadrick, R.M.; Perrin, B.M.; Knight, J.R.; Eschenbrenner, A.J.; Rueter, F.H.; and<br />

Feldsott, S. (1989, June). Research and Develonment of the Trainina Decisions System (AFHRL-TR-88-50).<br />

Brooks AFB, TX: Training Systems Division, Air Force Human Resources Laboratory.<br />

121


EVALUATING TRAINING PROGRAM MODIFICATIONS<br />

Deborah Lawson McCormick and Paul L. Jones<br />

Naval Technical Training Command<br />

Evaluating changes in training programs is never a simple<br />

task, even under laboratory conditions where threats to validity<br />

can be controlled. In operational settings evaluation may appear<br />

to be an insurmountable problem -- one in which good evaluation<br />

methodology does not seem feasible. One major problem for __<br />

evaluators in operational settings is that they are often not<br />

consulted until after training modifications have already been<br />

initiated. As a result, both experimental control and<br />

opportunities for data collection are severely limited.<br />

Even in those rare cases where evaluators are a part of the<br />

implementation from its onset, problems exist. For example, an<br />

evaluation design which uses equivalent control and experimental<br />

groups is often not possible in on-going training programs. In<br />

addition, operational settings are inherently dynamic environments;<br />

consequently, the effects of deliberate program changes are<br />

confounded with effects of other random factors which 'constantly<br />

impact the program. In these cases, isolating effects directly and<br />

unquestionably attributable to factors of the program change is<br />

impossible.<br />

This difficulty in establishing definite cause and effect<br />

relationships is sometimes used as a reason to forego evaluation.<br />

Rather than attempting a seemingly futile task, the tendency is to<br />

rely on intuition. The argument goes something like this: "These<br />

changes make sense, the students like them, the instructors like<br />

them . . . they probably work."<br />

However, increased competition for funding dollars makes the<br />

need to verify training improvement and justify additional funds<br />

crucial. Increasingly, funding sources are requiring hard data in<br />

support of dollars spent. As evaluators, we are being forced to<br />

accept that a less than perfect evaluation (that is, one which only<br />

suggests, rather than "proves," cause and effect) is better than no<br />

evaluation at all.<br />

This paper describes an evaluation model which we feel is<br />

flexible enough to prove useful in most evaluation circumstances,<br />

from the ideal condition, where evaluation has been planned in<br />

conjunction with change implementation, to those evaluation<br />

nightmares, where change implementation is complete before the<br />

evaluator is consulted. Following a brief description of the<br />

model, and application of its use is discussed.<br />

122


EVALUATION MODEL<br />

The evaluation system we recommend approaches evaluation on<br />

two levels. At the more immediate level, we attempt to determine<br />

effects on student performance in the specific training areas<br />

modified. For example, changes in test scores or training time in<br />

those specific content areas might be analyzed. The second level<br />

attempts to determine how the training program as a whole was<br />

affected by the modifications. Such measures as course attrition,<br />

total training time, performance in other areas of the program,<br />

etc., would be considered. Inferring cause and effect<br />

relationships becomes riskier as one moves to these more general<br />

measures of effect -- measures further removed from the proposed -'<br />

cause. However, modification in one area of the training program<br />

should ultimately affect the program as a whole and become<br />

manifested in these general measures. In reality, it is this<br />

broader impact which serves as the bottom line point of interest<br />

for most of our clients.<br />

The evaluation model we use can be condensed to a six-step<br />

process described below:<br />

1. Beqin evaluation nlanninq earlv, before imnlementation of the<br />

proqram chanqe if oossible. The evaluator should be involved as<br />

soon as possible, ideally during the modification planning stage -certainly<br />

prior to modification implementation. Many threats to<br />

validity can be anticipated and controlled if the evaluator is<br />

involved in this manner. Realistically, however, we know that this<br />

scenario seldom occurs. More often, the evaluator is called in<br />

after the modification has been implemented. For this reason, we<br />

usually find ourselves beginning with step two.<br />

2. Know the oroqram YOU are about to evaluate. A thorough<br />

understanding of the nature of the program change and its impact on<br />

the general operation of the training program is critical to good<br />

evaluation. The evaluator must understand the program's<br />

objectives, the anticipated impact of the change on these<br />

objectives, and the methods used to accomplish them. In addition,<br />

the evaluator must determine what data is currently being collected<br />

to evaluate program performance and whether this data might be<br />

useful in evaluating the program change. Most importantly, a<br />

definitive statement of how the change is intended to affect the<br />

program (that is, the goal of the program chanqe) must be<br />

formulated.<br />

3. Determine data collection procedures and qather baseline data.<br />

The purpose of baseline data is to develop a snapshot of how well<br />

the program is performing in the area to be modified prior to the<br />

change. Often you will find existing measures of performance, such<br />

as test scores, which directly address this question. In other<br />

123<br />

I


a<br />

----- --.___-__..-.. . . . _. ..____.._ -. ~~<br />

.<br />

cases, it will be necessary to introduce new data collection<br />

instruments, e.g., surveys, questionnaires, etc.<br />

Whether data collected should be restricted to only the area<br />

modified, to a broader segment of the program, or to the program in<br />

totality depends on the expected effect of the change. In general,<br />

the interdependence of program parts usually warrants a complete<br />

evaluation.<br />

4. Monitor the implementation of the chancre. Keep informed about<br />

how the implementation is proceeding. Document associated factors<br />

which might impact the success of the change, such as changes in<br />

instructor or student attitudes, changes in quality of either<br />

instructors or students, or changes in resources.<br />

5. After the nroaram has stabilized followina incorooration of the<br />

chanae. aather data for comoarison with the baseline. This step<br />

involves collecting data corresponding in type to the baseline data<br />

for a sample of students under the modified program. This step<br />

will involve readministration of instruments developed for the<br />

evaluation, for example attitude questionnaires.<br />

6. Analyze data and internret results. Most of our clients have<br />

neither the time nor the propensity to wade through a morass of<br />

statistics. Although we sometimes use fairly sophisticated<br />

statistical procedures and usually include these analyses in the<br />

report, we always attempt to synthesize the findings for our<br />

clients. We try to answer the general question of how the training<br />

program was affected by the modification in an easily accessible<br />

one page (or one table or figure) summary.<br />

MODEL APPLICATION<br />

In 1987, a project known as the Model School program was<br />

initiated at Electrician Mate's (EM) School at the Naval Training<br />

Center, Great Lakes. The purpose of this project was to examine<br />

the training program for EM's at this school, explore ways to make<br />

that training better, and implement those that were feasible. As<br />

a result of this project, a number of changes took place in this<br />

program over the next two years. For instance, a technology-based<br />

learning center was instituted, changes in remediation occurred,<br />

the testing program was revised somewhat, etc.<br />

In the spring of 1990, the Research Branch of the Navy<br />

Technical Training Command was tasked with conducting an evaluation<br />

of the impact of these changes on the training program. Because we<br />

were not involved during the implementation of the project, we used<br />

a modification of the six-step approach described above.<br />

In this case, our first step was getting to know the program<br />

we were tasked with evaluating. From talking with the school staff<br />

124


and various other sources, we became familiar with the training<br />

program as it existed prior to the Model School Project, the<br />

objectives of the project and changes made to the training program<br />

as a result of it, and other occurrences which, although they may<br />

have been coincidental, had potential for impacting the training<br />

program. We found that many of the changes had potential for very<br />

subtle impact; for example, the staff's optimism for the program<br />

probably improved their teaching, but this notion is difficult to<br />

substantiate. Also additional out- of-class study aids had been<br />

developed and introduced throughout the training program.<br />

Cumulatively, one would expect these changes to result in improved<br />

student performance; however, we felt the attempt to isolate and<br />

attribute effects to individual factors would be impossible'in a -.<br />

post hoc evaluation design.<br />

With these ideas in mind, we approached the evaluation with<br />

two broad questions: (1) How did the performance of pre-Model<br />

School project students compare with the performance of post- Model<br />

School students, in terms of attrition rate, setback (repeating of<br />

course segments) rate, test scores, and number of retests? (2)<br />

What changes occurred in the intervening time period which may have<br />

impacted student performance?<br />

Next we constructed baseline data for a group of students<br />

attending the training prior to modification for use as a quasicontrol<br />

group. The school had been utilizing an automated testing<br />

program which maintained students' scores on tests and number of<br />

retests taken. This data gave us a picture of performance in the<br />

individual content areas, as well as an overall measure of<br />

performance. We also collected more general performance measures<br />

such as course attrition and setback data.<br />

Corresponding information was collected for a comparison group<br />

who received the training after Model School Program<br />

implementation. Because academic ability levels of students in<br />

fundamental training courses have historically varied<br />

systematically with the season of the year, we selected our<br />

comparison group from months corresponding to that of the '8contro111<br />

group in an attempt to maximize equivalency of the two groups.<br />

Data were, analyzed and major findings presented in the summary<br />

format shown in Figure 1. This graphic representation enabled us<br />

to overlay potential impacting factors with major measures of<br />

student performance. Our clients liked this format because it<br />

provided an at-a-glance picture of both the changes to the training<br />

program and corresponding variations in terms of student<br />

performance. In this instance, the overall impact of the program<br />

appeared to be positive in that the two major indicators of<br />

training success, attrition and setback rates, both improved.<br />

125


-----r^___._.______I_______.<br />

-...-.._---_-----. - --<br />

126<br />

.--_ ._ __ _


.--_-.-- .---- ----- -...- -._--_.--_-__<br />

CONCLUSION<br />

Sometimes evaluators are hesitant to perform evaluations such<br />

as the one just described because they are too messy and imprecise.<br />

When we are coerced to perform them, we tend to apologize for the<br />

product. These evaluations do little more than provide a<br />

historical description the program in terms of its components and<br />

its performance. But even these evaluations serve two important<br />

functions. First, they provide concise, accurate descriptive<br />

information to program managers, information otherwise not<br />

available to them. Secondly, they establish a climate conducive to<br />

evaluation. Managers become aware that many questions they would<br />

like to have answered conclusively could be answered if evaluators -'<br />

are consulted early in the modification process.<br />

In summary, our advice is to accept those messy evaluation<br />

projects, adapt proven evaluation methodology/procedures to your<br />

particular set of circumstances, and conduct the evaluation,<br />

exercising controls to validity wherever possible. At best, you'll<br />

be able to analyze the data and draw cause and effect relationships<br />

with reasonable accuracy. At worst, you'll be able to synthesize<br />

the data and describe changes, suggesting possible causes. In<br />

either case, the client will have more information and be better<br />

equipped to make informed decisions than he/she would have been<br />

otherwise.<br />

127


_____._._ .- _ .-... -.._- -..._ . . _ _.<br />

The Effect of Reading Difficulty<br />

on Correspondence Course Performance<br />

Dr Grover E. Diehl (ECI)<br />

During the 1989-90 academic year the Air Force Extension Course<br />

Institute(EC1) broadly examined the impact on reading level on<br />

the correspondence courses in Career Development Course (CDC)<br />

First reading grade levels (RGL) of CDCs were<br />

~~~~~%$ using Che FORCAST method. The FORCAST method involved<br />

the manual counting of words in samples of text. These RGLs here<br />

then examined to determine whether the RGLs had increased significantly<br />

on a year by year basis and also with 1982 data to<br />

determine whether there were significant differences between the<br />

samples. Next, RGLs were correlated with end-of-course performance<br />

by percent of first time exam failures and then by proportion<br />

of overall course failures. Following this, the FORCAST<br />

RGLs were correlated with target RGLs prepared by the Air Force<br />

Human Resources Laboratory and with computer generated RGLs<br />

using a Flesch-Kincade formula.<br />

A basic intervening variable in the assessment of reading difficulty,<br />

however, was the fact that personnel and Air Force jobs<br />

were matched during enlistment processing so that the most intellectually<br />

demanding skills were peopled with the most intellectually<br />

able personnel. One way around this problem was to<br />

calculate difference scores between the RGL targets and the obtained<br />

FORCAST RGLs -- a measure of perceived difficulty of the<br />

material to the student -- and correlate this with failure rate.<br />

This procedure treated student ability as a covariate with a<br />

corresponding reduction in the error portion of the predication<br />

equation, without the necessity of using analysis of variance.An<br />

analysis of difference scores constituted the last question to<br />

be addressed.<br />

Findings<br />

FORCAST RGL and Edition Date. No statistically significant association<br />

was found between the FORCAST reading level and the edition<br />

date of the materials (a period of about 12 years). The<br />

Pearson Product Moment Correlation coefficient (r) of FORCAST<br />

RGL with edition date was .0742 (N = 215, p =.279). To check<br />

for possible curvilinearity, a scatterplot was prepared which<br />

suggested a completely random occurrence pattern. FORCAST reading<br />

level did not vary in a linear way from year to year.<br />

Difference Between RGLs Sampled in 1982 and 1990. There was apparently<br />

sufficient variation within the samples to be statisti-<br />

128


tally significant. Hotelling's T2 was 5.1998 with a probability<br />

of .027 at 1 and 2 degrees of freedom. It should be noted, however,<br />

that the test,;was made on a group to group basis and there<br />

was no indication which individual pairs may have changed the<br />

most. It was in fact possible that no pairs would significantly<br />

vary even though the full model rejected the null hypothesis.<br />

Also, since the test was non-directional, it was not possible to<br />

identify which group contained higher RGLs than the other although<br />

they appear to have been higher more recently. I<br />

.<br />

An observation related to this, however, was the significant'<br />

correlation of the RGLS of the two samples (r = .4709, p =<br />

.OOl). This raised an interesting situation in which samples<br />

taken in 1982 and 1990,: although significantly different,. were<br />

none the less related. The relatedness, however, was not developmental<br />

over time. One possible solution to this ambiguity was<br />

that RGL was varying with frequent but intermittent corrections<br />

using a current "clearly written text" standard.<br />

1990 RGLs and First Time Examination Failures. RGLs were not<br />

significantly related to first time examination failure rates (r<br />

= .0741 and probability = .327).<br />

1990 RGL and Overall Course Failures. ~11 students failing the<br />

first final examination were provided a retest. Course failure<br />

required failure of both the first examination and the<br />

reexamination. As was the case with first time exam failures,<br />

course failures were not significantly related to RGL in the<br />

1990 sample (r = .0404, probability = .403).<br />

FORCAST RGL and AFHRL Targets. The correlation between FORCAST<br />

RGLs of course materials in the 1990 sample and AFHRL targets of<br />

actual student reading ability (50th percentile reading ability)<br />

was .0695 and was not significant (p = .333). A reduced target<br />

at the 15th percentile also failed to be significantly related<br />

to the obtained FORCAST RGL (r = .0249 with probability = .439).<br />

The data failed to demonstrate that the variation within the<br />

reading ability of personnel was linearly related to FORCAST<br />

RGLs of the CDC material.<br />

FORCAST RGL and Flesch-Kincade RGL Comparison. Due to resource<br />

limitations on the Flesch-Kincade RGL side, comparisons were<br />

made on only one CDC consisting of four volumes. The means and<br />

SDS were 11.2725 and . 5187 for FORCAST and 9.0800 and .1619 for<br />

Flesch-Kincade. The obvious difference between the averages was<br />

significant with Hotelling's T2 of 92.8489 and probability equal<br />

.004 '(df = 1 and 3). Flesch-Kincade generated significantly<br />

lower RGL estimates than did FORCAST. The correlation, although<br />

129


------ ---- __ ._ ._ -.._-_-- .--.<br />

- .<br />

large by research standards (r = .5252) was not statistically<br />

significant (p = .237).<br />

Difference Scores and Failure Rates. Using a 50th percentile<br />

personnel reading ability as a target base, the correlation of<br />

the RGL deficits with first time exam failure rate was .2117<br />

with a probability equal to . 101 -- not statistically significant.<br />

When a 15th percentile target base was used, the correlation<br />

of the deficits with first time failure rates was also<br />

not significant (r = .2459, p = .068). Similar analyses of .<br />

course failure rates yielded the same result.<br />

Conclusion<br />

FORCAST reading grade levels were not significantly associated<br />

with end-of-course test performance, reading grade level targets<br />

using the Air Force Reading Ability Test scale, or Flesch-<br />

Kincade reading difficulty obtained from a computer analysis.<br />

Additionally, FORCAST reading grade levels had not changed consistently<br />

over time. There was evidence that RGL had risen<br />

slightly sometime during the eight year period but it was unclear<br />

whether the rise was continuing.<br />

Careful examination of the summed evidence suggested, however,<br />

that the null outcomes were possibly due to an aggressive<br />

"clearly written text" program within ECI. This effort, which<br />

replaced FORCAST in the mid-1980s, introduced an ongoing conscious<br />

effort on the part of the text writers and reviewers to<br />

ensure the readability of the materials. Earlier information<br />

suggested that use of FORCAST was associated with a reduction in<br />

reading difficulties to the point where FORCAST was no longer<br />

predictive. Present data suggested that the "clearly written<br />

text" standard may continue to limit the value of FORCAST as a<br />

predictive indicator.<br />

Discussing more generally the issue of attention to RGL, it was<br />

noted that most ways of determining RGL and tests designed to<br />

assess the reading ability of students were highly correlated -often<br />

as highly intercorrelated as the validity coefficients of<br />

the individual measures. Differences in outcome Values were<br />

typically due to scale. The task of maintaining acceptably low<br />

reading difficulty within written materials was primarily one of<br />

maintained focus on the problem using any of several means. FOR-<br />

CAST was one means easily calculated by hand. The Flesch-<br />

Kincade RGL provided here by Right-Writer, although almost<br />

necessitating a computer, was a viable option especially when<br />

the written material was already in an acceptable word processing<br />

medium. The Right-Writer output in fact contained consider-<br />

I . ..1__<br />

II- -...<br />

130


able ancillary information which could be useful to writers. For<br />

example, suggestions for making writing more direct and improvement<br />

of sentence structure were given, and there was a listing<br />

of negative words, jargon, colloquial and misused words,<br />

questionable spellings, and words which readers may not understand.<br />

External reports such as these served to alert writers<br />

and reviewers to idiosyncrasies which may distract the student<br />

from the material and to maintain focus on reading difficulty<br />

and level.<br />

Audience: Instructional developers.<br />

. .<br />

For more information contact Dr Grover Diehl, ECI, Gunter AFB AL<br />

36118-5643,<br />

AUTOVON 446-3641 or commercial 205-279-3641.<br />

131


Navy Basic Electricity Theory Training: Past, Present, and Future<br />

Steve W. Parchman, John A. Ellis, & William E. Montague<br />

Navy Personnel Research and Development Center<br />

THE OPINIONS EXPRESSED IN THIS PAPER ARE THOSE OF THE AUTHORS,<br />

ARE NOT OFFICIAL AND DO NOT NECESSARILY REFLECT THE VIEWS OF<br />

THE NAVY DEPARTMENT<br />

Introduction<br />

Basic electricity and electronics theory training (BETT) in the Navy has historically<br />

had high attrition and setbacks and has been plagued by questions about the relevance of<br />

the course content to Navy jobs. BETT is taught as a separate topic at the beginning of<br />

more than twenty Navy A schools to more than 20,000 students annually. BETT<br />

material historically has proven difficult for students to learn and has resulted in high<br />

attrition and set-back rates. For example, in FY 88 attrition in five electrical A schools<br />

averaged 28% (AE, ET, EM, IC, DS; total annual throughput = 5000). Average setback<br />

rate for these same schools was 69%. Approximately 70% of these losses occurred in the<br />

BETT phase of these courses. Further, the abstract nature of this content has raised<br />

questions about its relevancy for vocational jobs. For example, recent research has<br />

shown that trainees who have passed course tests fail to pass relatively simple practical<br />

exercises. These problems with trainee learning have remained even in the face of<br />

substantial expenditure of effort to revise the content and to change the method of<br />

delivering it.<br />

Research on learning and training suggests that more fundamental changes in<br />

curriculum structure can lead to improvements in learning. Research and development is<br />

needed to develop and test alternative methods for training electrical and electronic<br />

theory, with the goal of reducing both attrition and setback rates by a minimum of 25%.<br />

This paper discusses Navy basic electricity and electronics theory training (BETI’)<br />

with some suggestions for development of future training programs. It begins by briefly<br />

reviewing the history of Navy BETT training followed by a discussion of alternative<br />

approaches to this training that have been tried. Finally, several options for training<br />

improvements are presented.<br />

BETT History<br />

Through the 1950s and 6Os, Navy electronics training was both theory and math<br />

intensive. Well qualified trainees were amply available, thanks in part to the draft. “A”<br />

School electronics courses, often eight months long, challenged the trainees and also<br />

prepared them for the rigors of the “B” schools. “B” schools of up to fifteen months were<br />

available to qualified re-enlistees. These schools resembled university engineering<br />

programs.<br />

Perhaps it was inevitable that two dozen or more schools around the country<br />

independently teaching similar content would generate pressure for consolidation. In the<br />

early 196Os, consolidations were carried out, and a common syllabus, based on Bureau of<br />

Personnel publications, was adopted at each of the major training centers.<br />

TWO factors which came into play in the 1960s and 70s resulted in major changes in<br />

Navy electronics training. First, the Programmed Instruction (PI) movement reached its<br />

Peak of popularity in the 60s. Use of this approach in the Navy was judged desirable,<br />

132


and a contractor (Westinghouse) was funded by the Bureau of Personnel to convert the<br />

basic or introductory portion of Electricity/Electronics courses into a self-teaching<br />

-format. The contractor’s charter was not to change the substance of the course, but rather<br />

to convert it into a different “delivery system.” With the assistance of a committee of<br />

E/E instructors from San Diego schools, the basic lectures of the BuPers syllabus were<br />

converted into narrative and PI materials, summaries were written, test items were<br />

inserted as progress checks, and module tests, midterms, and final exams were also<br />

prepared from existing test items. In 1968, a partially self-paced compromise version of<br />

several variations of instructor-taught basic electricity and electronics was offered.<br />

Second, in 1967, NPRDC (then Navy Personnel and Training Research Laboratory)<br />

and CNATECHTRA began work on a computer-managed instruction (CMI) system.<br />

One of the courses eventually put on the CM1 system was the Westinghouse conversion<br />

of the basic E/E course. With only minor modifications, it was on-line at NAS Memphis<br />

as a CM1 course in 1973.<br />

A major organizational change also influenced BE/E training. Following<br />

recommendations of a review board (the Cagle Board), control of technical training was<br />

moved from Bureau of Personnel in 1972, and vested in two new organizations, Chief of<br />

Naval Education and Training (CNET) and Chief of Naval Technical Training (CNTT),<br />

with the latter absorbing the functions of CNATECHTRA. These new organizations<br />

evaluated the CM1 course and concluded that this form of training could be effective and<br />

economical. A CNTT in-house group (MIISA) was created to improve and expand the<br />

CM1 software. Basic E/E was consolidated in four schools and incorporated into the<br />

CM1 system. In 1975, the Westinghouse version of the San Diego compromise of the<br />

BuPers version of Basic E/E was standardized throughout the Navy. Between 1975 and<br />

1984, while there were cosmetic changes to the course material, the only substantive<br />

modifications were to increase the CM1 system’s ability to output various summary<br />

reports, to eliminate some “nice-to-know” material, and to add some modules on newer<br />

technologies such as transistors and advanced circuitry. The system was effective in<br />

meeting its goal of graduating large numbers of students in a significantly reduced<br />

amount of training time.<br />

In late 1984, CNET made two major decisions regarding BETI’: (1) the courses<br />

will be converted from self-paced to group-paced instruction, and (2) the training will be<br />

integrated into the appropriate “A” schools (thus, BETI courses would disappear as<br />

separate entities). The conversions began in 1985 and the majority were completed by<br />

1989. The BETT courses were phased out. These conversions did not result in any<br />

major redesign of BET’I’ training. Instead the existing BE’IT materials were adapted and<br />

added as the initial phase to the existing “A” school courses. The decision to add BETT<br />

to ‘A’ school courses presented an opportunity to increase the job relevance of this<br />

training. However, the schools were unable to do this during the initial adaption phase<br />

because they did not have the resources for making major course revisions.<br />

In general, the problems with the current materials are: (1) there has not been a<br />

recent job or task analysis; the instruction on electricity was adopted from simplified<br />

older physics courses*, (2) there are opportunities for course improvements including<br />

*Physicists regard the content and sequence of BETT to be outmoded theoretically. “The study of<br />

electricity at rest - “electrostatics” - used to bulk large in elementary physics. It was all that was<br />

133<br />

.


instructional design, test items, and laboratory experience that should be explored, and<br />

(3) students do not seem to develop a good practical understanding of electricity and<br />

electronics. Attention to all of these issues could lead to lower attrition and setback rates,<br />

and improved transfer to job tasks.<br />

Practical Test<br />

A practical hands-on performance exam was developed by NPRDC to test the<br />

transfer ability of BETT curriculum to more relevant job situations. The BETT course<br />

objectives were evaluated to determine which would be the most common to the job a<br />

technician might do in the fleet. From this analysis five hands-on test questions were<br />

developed using real components, resistors, capacitors, conductors, and a flashlight<br />

(battery) which tested the trainee’s ability to recognize and identify electrical<br />

components, determine the components operating condition using a multimeter, and<br />

analyze its effect in an operating circuit. Since the BETT course materials focus on use<br />

the multimeter, the test assumed a similar focus. Prior to giving the test the school staff<br />

evaluated it and thought that their students would have little difficulty achieving a good<br />

score.<br />

The test was given at two Navy class A schools once in 1986 at the Avionics<br />

Technician class ‘A’ school in Memphis, and again in 1988 at the Electrician’s Mate<br />

class ‘A’ school in Great Lakes, Illinois.<br />

The Memphis Study.<br />

The hands-on test was given to determine whether students could apply knowledge<br />

and skills learned in BETT in practical situations. The data from the hands-on<br />

performance test show that these students performing at very low levels. The mean<br />

scores for this test were 61.3 (n=l05) of 104 possible points. The mean score was<br />

considerably lower than the end-of-course-curriculum test scores and would be<br />

considered below passing in most Navy schools.<br />

The Great Lakes Study.<br />

In June 1987 the Chief of Naval Education and Training (CNET) designated the<br />

Electrician’s Mate (EM) ‘A’ School at Great Lakes, Illinois a “model school.” The goal<br />

was “to apply the best techniques and instructional technologies available... so that we<br />

will have in place curriculum, technologies, and management techniques which reflect<br />

the very best we currently know about teaching and learning.” The Navy Personnel<br />

Research and Development Center (NPRDC) was asked to participate in the model<br />

school effort.<br />

Our first research effort was to evaluate EM ‘A’ school Phase I training, which is<br />

the basic electronics and electricity (BETT) portion of the course. The hands-on<br />

performance test was given to 44 trainee’s from the first two phases of the course. and<br />

23 trainee’s awaiting initial instruction. The objective was to determine if phase 1 and<br />

11 trainee’s could solve practical problems using the knowledge and skills taught in EM<br />

known of electricity two centuries ago, and tradition dies hard. It makes a poor beginning for<br />

modem electric circuits. Now you need some knowledge of it for atomic physics. How much you<br />

see and learn of this part of physics will depend on apparatus, weather, and instructor. On the<br />

whole the 1~~s the better.” From, Rogers, E.M. 1977, Physics for the inquiring mind. Princeton,<br />

NJ: Princeton University Press, page 533.<br />

..----.%-- L . . ., -.<br />

134<br />

_ .


‘A’ and what knowledge a typical trainee brings with him to the school.<br />

All subjects were given the six practical problems to solve. The mean score for the<br />

two trained groups was 44.6 out of 140 points possible. An additional test item<br />

developed by the EM school staff accounts for the increase in total possible points.<br />

The average trainee had difficulty measuring values in simple electrical devices<br />

using the multimeter. For the most part the trainee’s know how to use the multimeter.<br />

However, they have difficulty knowing when or where to use it. Further, even after<br />

completing Phase II training, most trainees are not able to accurately interpret meter<br />

readings to identify an open or short, which is fundamental to equipment maintenance.<br />

Trainees did much better recognizing the various electrical components than they<br />

did measuring them. However, less than 50 percent of the PI and PI1 trainee’s were able<br />

to identify a capacitor, and a significant number of PI trainees had problems identifying a<br />

battery, conductor, and resistor.’ After the second phase component identification<br />

improves significantly.<br />

Alternative Approaches<br />

Over the last 40 years a number of alternative approaches to teach BETI’ have been<br />

developed and tested. This section will summarize some of the more significant work in<br />

this area. While none of these projects specifically reports cost data, all of them report<br />

decreases in attrition and set back rate and some report decreases in course length. All of<br />

the decreases directly relate to cost savings.<br />

The first test was done in the Navy School of Electronics in 1949 (Johnson 1951).<br />

The course that was changed was the basic electricity and electronics for radio, sonar,<br />

and radar maintenance. The results were that the course was shortened from 26 to 18<br />

weeks, attrition as compared to a control group was reduced by 66% and the setback rate<br />

dropped significantly.<br />

The second, was the LIMIT project done by HumRRO in the late 1950s (Goffard,<br />

Heimstra, Beecroft & Oppenshaw 1960). It reorganized the three week basic electricity<br />

section of a field radio repair according to job-oriented training (JOT) principles. A<br />

comparison of conventional students with JOT students showed that the latter achieved<br />

higher test scores.<br />

The third was project REPAIR again done by HumRRO in the late 1950s and early<br />

1960s (Brown et. al. 1959, Shoemaker 1960). The course modified was the entire field<br />

radio repair course. Approximately 100 students completed the new field radioman’s<br />

class. When their performance was compared with that of graduates of the traditional<br />

class, it was found that they were “significantly superior ” to the traditional class in four<br />

of seven test administered--troubleshooting, test equipment, repair skills, and<br />

achievement. No improvement was noted on the alignment, manuals, and schematics<br />

tests. An interesting finding was that the experimental course graduates were superior to<br />

the standard course graduates on each of the 8 problems that made up the troubleshooting<br />

test. This is impressive since 3 of the problems involved equipment on which the<br />

experimental course students received only 4 hours of familiarization training, compared<br />

with 38 hours of training for each student in the standard course.<br />

The fourth project was X-ET which was done at NPRDC in the mid 1960s<br />

(Pickering & Anderson 1966, VanMatre & Steinemann 1966, Steinemann, Harrigan, &<br />

VanMatre 1967). An experimental electronics technician (X-ET) course was developed<br />

135<br />

. .


that differed from ongoing courses in that it accommodated students previously<br />

considered unqualified in terms of aptitude scores and education. Training was oriented<br />

towards job skills and minimized nonessential math.ematics and electronics theory. The<br />

results showed that the X-ETs were taught to required levels of proficiency in a<br />

substantially shorter time than in the conventional course. In follow up studies of job<br />

performance it was found that, in general, the X-ETs were performing their duties<br />

satisfactorily in comparison with a control group and on the basis of ratings by<br />

supervisors and peers. They were superior to control ETs in troubleshooting, even<br />

though they scored lower on paper-and-pencil tests of electronics knowledge.<br />

The fifth project, SUPPORT, applied JOT to the Army’s medical corpsman’s course<br />

(Ward, Fooks, Kern & McDonald 1970). (This was not a BETI’ revision.) The course<br />

was changed from a lectured based, theory oriented course to a more job oriented course<br />

where the content was organized so the relevance of each new topic was readily<br />

apparent. The evaluation revealed that JOT students performed better than<br />

conventionally trained corpsmen in 21 out of 26 tests, including both paper-and-pencil<br />

tests and extensive job-sample, simulated performance tests. In addition, JOT students<br />

were faster than conventional corpsmen in attending to serious battle field wounds.<br />

There was a project related to the SUPPORT project that was aimed at extending the JOT<br />

methods used in the corpsmen training to radio operator training (Goffard, Polden &<br />

Ward 1970). The findings were that the recycle rate for trainees was reduced by 30<br />

percent in comparison with the standard course, and attrition was reduced by about 50<br />

percent. These outcomes were achieved even though the JOT classes were 40 percent<br />

larger and contained twice as many mental category IV personnel as the standard course.<br />

The final project was APSTRAT (Weingarten, Hungerland, Brennan, & Allred<br />

1971). This project was specifically targeted for low aptitude personnel admitted under<br />

Project 100,000. The findings were that the redesigned Army field wireman’s course had<br />

35 percent less attrition and that set back rates were cut from 30 percent to zero.<br />

Future Direction for BETT Training Development<br />

Based on the findings of the above studies and the results of the recent NPRDC<br />

hands-on practical tests, below are two alternative approaches to BETT training that<br />

could be used to make future Navy technicians better equipped to maintain the<br />

sophisticated weapons systems in tomorrows Navy.<br />

Develop a job oriented BETT course that is generic to all electrical schools. The<br />

basic electricity front-end that has been added to the ‘A’ schools would be converted<br />

from an abstract, mathematics and physics knowledge oriented course to one where jobrelevant<br />

skills are practiced in a situation of actual use. The current ‘A’ school phases<br />

would remain the same. The new front-end training would build on the trainee’s<br />

knowledge of familiar electrical devices to teach basic electrical operation and<br />

maintenance concepts. The knowledge acquired using these devices should transfer to<br />

the equipment used in the later phases of ‘A’ school and on-the-job. The new job<br />

oriented training would stress hands-on trainee performance. Hands-on experience would<br />

increase from the current twenty-five percent to sixty percent or more of total class time.<br />

The training would be developed, implemented and evaluated at one electrical school to<br />

determine the feasibility of implementing it in other electrical schools.<br />

136


Develop job oriented electricity theory training using equipment and tasks specific<br />

to each ‘A’ school. The basic electricity theory training would be integrated into the ‘A’<br />

school equipment operation and maintenance lessons. There would be no separate frontend<br />

theory training. The trainee would learn basic electrical operation and maintenance<br />

concepts on the equipment used on-the-job in the fleet or on reasonable simulations. The<br />

training would be sequenced so the easier/more familiar devices would be taught first,<br />

with more difficult concepts and techniques being taught with the more complicated<br />

devices. For example, for initial basic theory training, a radio receiver at ET ‘A’ school,<br />

or the small boat lighting system at EM ‘A’ school could be used to teach students basic<br />

circuit operation, preventive, and corrective maintenance methods. Those simple devices<br />

could be followed with more complicated devices that have more advanced concepts. ks<br />

in the generic option the emphasis would be placed on trainees learning hands-on<br />

practical skills. Laboratory time would increase to allow sufficient time for the trainees<br />

to become skilled in the application of the theories and concepts learned.<br />

References<br />

Brown, G., Zaynor, W., Bernstein, A., & Shoemaker, H. (1959). Development and evaluation of an<br />

improvedfreld radio repair course. HumRRO-TR-58-59. Alexandria, VA: Human Resources<br />

Research Organization.<br />

Goffard. S., Heimstra, N., Beecroft, R., & Opcnshaw. J. (1960). Basic electronicsfor minimally<br />

qualified men: An experimental evaluation of a method of presentation. HumRRO-TR-61-60.<br />

Alexandria, VA: Human Resources Research Organization.<br />

Goffard. S., Poldcn, D., & Ward. J. (1970). Development and evaluation ofan improved radio<br />

operator course (MOS 05&X7). HumRRO-TR-70-8. Alexandria, VA: Human Resources<br />

Research Organization.<br />

Johnson, H. (1951). The development of more effective methods of training electronics technicians.<br />

Washington, DC: Working Group on Human Behavior Under Conditions of <strong>Military</strong> Service,<br />

Research and Development Board, Department of Defense.<br />

Pickering, E., & Anderson, A. (1966). A performance-oriented electronis technician training program:<br />

I. Course development and evaluation. STB 67-2. San Diego: U.S. Naval Personnel Research<br />

Activity<br />

Shoemaker, H, (1960). The functiona context method of instruction. IRE Transactions on Education,<br />

Vol. E-3, no. 2, June 1960,52-57.<br />

Steinemann, J., Harrigan, R., & VanMatre, N. (1967). A performance-oriented electronics training<br />

program: IV. Fleetfollowup evaluation of graduates of all classes. SRR-68-10. San Diego: U.S.<br />

Naval Personnel Research Activity.<br />

VanMatre, N., & Steinemann, J. (1966). A performance-oriented electronics technician training<br />

program: II: Initialfleet follow-up evaluation of graduates. STB-67-15. San Diego: U.S. Naval<br />

Personnel Research Activity.<br />

Ward, I., Fooks, N., Kern, R., & McDonald, R. (1970). Development and evaluafion of an integrated<br />

basic combatladvanced individual training program for medical corpsman (MOS PlAlO).<br />

HumRRO-TR-70-l. Alexandria, VA: Human Resources Research Organization.<br />

Weingarten, K., Hungerland, J., Brennan, M., & Allred B., (1971). The APSTRAT instruction model.<br />

HumRRO PP-6-71, Alexandria, VA: Human Resources Research Organization.<br />

137<br />

.


USING EVENT HISTORY TECHNIQUES TO ANALYZE TASK PERISHABILITY:<br />

A SIMULATION<br />

Spnley~ D. Stephenson<br />

Southwest Texas State University<br />

Julia A. Stephenson<br />

University of North Texas<br />

Until now, task perishability, the point in time at which a<br />

~ task drops out of an airmanls inventory of tasks performed, has<br />

not been researched. This lack of research could be for two reasons.<br />

First, task performed/not performed is usually of more<br />

interest than is when a task leaves a job inventory. Second,<br />

perhaps measurement techniques for determining task perishability<br />

are either unavailable or unknown. In any event, little is known<br />

about task perishability. . _ .<br />

For a variety of reasons, knowledge about task perishability<br />

would be useful, primarily in training. For instance, the decision<br />

about when and where to train a task (formal school or OJT)<br />

could depend on how long the task is going to be used. Perhaps<br />

the most obvious use of task perishability would be in crosstraining.<br />

If a task can be determined to have a relative short<br />

residual life, perhaps training on that task is not necessary,<br />

even though the task is currently in the job inventory of comparable<br />

time-in-grade airman. Also, task perishability is<br />

obviously related to skill decay. A skill can be retained long<br />

after a task stops being performed. However, if a task perishes<br />

from an airman's inventory of tasks performed, the corresponding<br />

skill will eventually leave that airman's inventory of skills.<br />

Before skill decay can be measured, information about task perishability<br />

should be known.<br />

This paper will study the feasibility of measuring task perishability<br />

using a technique called Event History Analysis.<br />

There are two major features of event history as it applies to<br />

task perishability. First, it incorporates time in the analysis;<br />

;,zoAdhow long did a fa.sk stay in an airman's job inventory?<br />

It has the ability to handle censored data. Censored<br />

data i; data on which you have only partial information. For<br />

example, not all airman complete their first term enlistment. Of<br />

those who leave early, some will have stopped doing a task, but<br />

some will still be performing the task. Consequently, information<br />

about when the task would have left the censored airmen's<br />

job inventories if they had stayed in the Air Force is unknown:<br />

however, that the task lllivedll until the point of censoring is<br />

known. Rather than discarding these censored data, event history<br />

incorporates the available information and, although incomplete,<br />

produces more precise estimates of task survivability.<br />

TO study task perishability with event history techniques,<br />

this paper used a simulated data base of a type which could be<br />

derived from the data produced by the USAF Occupational Survey<br />

Program and other sources.<br />

USAF Occupational Survey Program<br />

The USAF job analysis program is frequently referred to by<br />

the term, CODAP (or TI/CODAP) (Christal & Weissmuller, 1988).<br />

CODAP usually involves taking a snapshot of the entire work force<br />

at one point in time; i.e., rather than being longitudinal, the<br />

data collected is vertical. Consequently, the data do not provide<br />

information about what an individual airman does over a 20<br />

138


year career. Instead, CODAP provides information about what all<br />

airmen are doing in groups such as first term, second term , or<br />

career enlistees. Also, the survey is administered.to essentially<br />

100 percent of the career field and produces a response<br />

rate of over 80 percent.<br />

Event History Analysis<br />

Event history analysis enables the researcher to determine<br />

probabilities associated with the length of time for a binary,<br />

dependent variable to change states. Another requirement is<br />

knowledge of the time from the start of the experiment to the<br />

change in state of the dependent variable. Both the origin time<br />

and the exact point at which the dependent variable changes must<br />

be precisely defined. Also, the length of time must always be a<br />

positive value. The last assumption is that the sample should be<br />

homogeneous (Cox & Oakes, 1984).<br />

One of the strengths of event history analysis is the ability<br />

to include some information concerning censored data. An item is<br />

considered to be censored if it is removed from the sample before<br />

the experiment is terminated and the dependent variable has not<br />

changed states. A second type of censoring occurs if the experiment<br />

ends before the dependent variable changes. In most parametric<br />

statistical analyses, such data would have to be omitted<br />

from the sample. However, the fact that the item had not changed<br />

at the point of leaving or ending the experiment can provide some<br />

relevant information that should be incorporated into probabilities<br />

associated with the time at which the dependent variable<br />

changes states.<br />

Several probabilities are associated with event history analysis.<br />

The failure and the survival functions represent cumulative<br />

distributions about when the dependent variable changes<br />

states. Failure is defined as the change in the dependent variable;<br />

survival is the lack of change. The hazard function represents<br />

the conditional probability that the dependent variable<br />

will change states in a specific time period, given that it had<br />

not changed states in the previous period (Kalbfleisch & Prentice,<br />

1980). The mean life residual function represents the<br />

average length that the dependent variable will survive beyond<br />

the specified time period (Oakes & Desu, 1990).<br />

All of these functions are related mathematically. E.g.,<br />

once the survival curve is estimated, the mean life residual can<br />

be determined. The most widely used method for computing the<br />

survival function is the product limit estimator proposed by<br />

Kaplan and Meier (1958).<br />

Method<br />

At first glance, event history analysis does not seem appropriate<br />

for examining Air Force occupational data. One problem is<br />

that the Air Force maintains little data on persons who leave the<br />

service. Also, the actual time that a person stops doing a specific<br />

task is not recorded. However, data gathered by occupational<br />

surveys do meet the required assumptions.<br />

Event history requires that the dependent variable be<br />

binary. For task perishability, this translates to whether or<br />

not a task is being performed. In an occupational survey,<br />

respondents check if they are performing a task; thus, task performance<br />

is known.<br />

The second assumption of event history is that the origin and<br />

139


___r-.--r--...<br />

.-.--.- --<br />

I- _ - _--._ ..-. .~<br />

exact point at which a job leaves a person's inventory must be<br />

specified. Actually, however, the only information that is necessary<br />

is the length of time that a person holds a specific task<br />

in the job inventory. To meet this requirement, a small mental<br />

transformation of the data is necessary. Occupational surveys<br />

provide information on the percent members performing a task in<br />

each time interval. The difference in percent member performing<br />

over two intervals is in essence a measure of those who have<br />

stopped doing a task. Therefore, occupational survey data meets<br />

the two primary assumptions of event history analysis. However,<br />

a problem is the inclusion of the censored data. While the Air<br />

Force does have information regarding AFSC attrition rates,<br />

whether the specific task is in an airman's inventory when he<br />

leaves the service (attrites) is unknown.<br />

For the purposes of this study, we generated a 1000 person<br />

data base. This data base included actual data points for a task<br />

leaving an airman's job inventory, as well as censored data,<br />

which simulated those airmen who leave the Air Force prior to the<br />

task leaving their inventory. While this model is not specific<br />

to any career field, it does incorporate several facts which are<br />

intrinsic to the job/career development in the Air Force. For<br />

instance, many airmen spend up to 12 months in training before<br />

actually being assigned to a work place. Thus, this model starts<br />

simulating at the thirteenth month, which is actually the first<br />

point in time that a task could leave an incumbent's inventory.<br />

Another consideration is the large change in status at the 48th<br />

month. At this point many airmen leave the service: of those who<br />

do continue in the Air Force, some change career fields. This<br />

change results in many censored data points at the 48th month.<br />

In summary, single task performance data for an initial set<br />

of 1000 airman was simulated over a 6 year (72 month) period.<br />

Using the type of data available from Occupational Survey<br />

Reports, percent members performing for each month interval were<br />

created. Censoring was also generated for this simulation.<br />

Although exact censoring data cannot be determined from current<br />

Air Force data bases, historical attrition data are available.<br />

The censored data values can then be estimated from the attrition<br />

data using the information from current percent members performing<br />

a single task. A total of 300 (30%) censored data points<br />

were inserted into the data base using a random number procedure.<br />

From this simulated data base, three functions were calculated:<br />

the Survival function, the Hazard function, and the Mean<br />

Life Residual function. All calculations were performed using<br />

the Lifetest procedure in SAS. Examples of the survival and the<br />

mean‘life residual functions are given in this paper.<br />

Results<br />

Figure 1 shows the survival function for the simulated data<br />

base. It represents the probability of an airman at a specific<br />

time period performing the task. For example, at the 36th month,<br />

the probability of an airman still performing this task is .54.<br />

Figure 2 represents the mean life residual function for the<br />

simulated data base. This function can be interpreted as the<br />

average length that an airman will be performing the task beyond<br />

a specific time period. At the 36th month, on average, an airman<br />

will be performing this task 13.8 more months.<br />

140<br />

.


2E<br />

24<br />

20<br />

I@<br />

12<br />

8<br />

4<br />

Figure 4<br />

Mean Life Comparison<br />

monlna .- --..- - _._ ___ _.____ . _ _. _ __ _ -.-.- .-. -----<br />

0<br />

12 18 20 24 28 32 38 40 44 48 52 68 to 84 T’<br />

months<br />

--- Inalude censors --O 0m1 iill cansors<br />

Figures 3 and 4 show a'comparison between the data base with<br />

all 1000 airmen (event history analysis) and the data base with<br />

700 airmen (i.e., all censored data omitted). The difference in<br />

the two survival functions (figure 3) is greatest at the 48th<br />

month, the point at which censoring is heaviest.<br />

The difference between the two mean life residual functions<br />

(Figure 4) is greatest at the beginning of the 13th month, basically<br />

because excluding the 300 censored data points removes some<br />

information about how long a task is performed. At the 48th<br />

month the two curves become very similar. Thus, censoring after<br />

the first term has less of an effect on the mean life residual<br />

function.<br />

This data could also be presented in a table format. A portion<br />

of these functions is shown in Table 1.<br />

Month<br />

8;<br />

38<br />

39<br />

40<br />

Table 1<br />

Comparison Data<br />

SUrVlval Function Mean Life ReddUal<br />

lndude Omit include cmt<br />

Censora Coneore Censor8 ceneora<br />

A44 -377 13.307 9.273<br />

-627 -364 13.264 8.871<br />

A16 .a40 12.647 8.2U<br />

.496 .a13 12.078 7.969<br />

A79 .293 11.472 7.602<br />

Discussion<br />

The results of this study show that event history analysis<br />

can be used to investigate task perishability. Due to the method<br />

of collecting task data in the Air Force's Occupational Survey<br />

Program, accurate figures can be obtained for the change in state<br />

of the binary variable, e.g., task perishability. Historical<br />

attrition data are available for all career fields. Thus, censoring<br />

is the only unknown variable, and it can be accurately<br />

estimated by combining occupational and attrition data. Therefore,<br />

an appropriate data base can be created for any AFSC.<br />

The results of the analysis also show the advantage of using<br />

event history to analyze task perishability. Figures 3 and 4<br />

vividly illustrate the difference in analyzing task perishability<br />

141<br />

. .


,a,:<br />

t<br />

/ ;<br />

Figure 1<br />

Survival Function<br />

I-r


using event history analysis, which can accommodate censored<br />

data, and using conventional analytical procedures, which essentially<br />

discard censored data. Estimations of both the survival<br />

and mean residual life functions are more accurate using event<br />

history analysis. Therefore, the results of this study strongly<br />

suggest that analyzing task perishability with event history<br />

techniques should continue to be studied.<br />

The use of event history analysis to examine occupational<br />

data, such as task perishability, is a new application of this<br />

statistic. Thus, several research issues need further examination.<br />

Of primary concern is the inclusion of censored data. As<br />

mentioned earlier, the Air Force does not maintain records of the<br />

tasks performed by persons who attrite. Therefore, determining<br />

the number of censored data points at each interval will have to<br />

be modeled. A logical start point would be to use the known<br />

information on percent (of those who complete the occupational<br />

survey) members performing as an estimation of the percentage of<br />

those who attrited but still held the task in their inventories.<br />

The assumption that 100% of the airmen are performing the<br />

task at the start of the career field raises a potential theoretical<br />

issue. However, the math underlying the model is primarily<br />

based on conditional probabilities, thus deviating from this<br />

assumption would not seem to have a severe effect on the task<br />

performance probabilities. Another theoretical question concerns<br />

the homogeneity of the persons in a particular career field. A<br />

more accurate analysis of when a task leaves an airman's job<br />

inventory may be accomplished by sub-grouping the career field<br />

with a covariate such as present grade, skill-level, or gender.<br />

Also, some tasks may perish more rapidly for airmen who are in<br />

their second career field. These and other theoretical issues<br />

need to be researched.<br />

An area of interest for further research is task emergence.<br />

The model set forward in this study could easily be restructured<br />

to analyze when a task enters a job inventory. A strength of<br />

this type of analysis is that it would provide information on a<br />

continuum, by month, instead of chunking by 1st term, 2nd term,<br />

etc. Perhaps task emergence and task perishment could be linked<br />

to provide more information on when and by whom a task, or group<br />

of tasks, is performed in a career field.<br />

References<br />

Christal, R. E., & Weissmuller, J. J. (1988). Job-task inventory<br />

analysis. In S. Gael (Ed.) The job analysis handbook for business,<br />

industry, and government (Vol II), pp. 1036-1050. New<br />

York: Wiley.<br />

cox, D. R. & Oakes, D. (1984). Analysis of survival data. New<br />

York: Chapman and Hall.<br />

Kalbfleisch, J. D. C Prentice, R. L. (1980) The statistical analysis<br />

of failure time data. New York: Wiley.<br />

Kaplan, E. L., & Meier, P. (1958). Nonparametric estimation from<br />

incomplete observations. American Statistical <strong>Association</strong> Jour-<br />

@, 53, 457-481.<br />

Oakes, D. & Dasu, T. (1990) A note on residual life. Biometrika,<br />

77, 409-410.<br />

143<br />

-.


-__- ..___ -..-<br />

A FIRST LOOK AT THE EFFECT OF INSTRUCTOR BEHAVIOR<br />

,IN A COMPUTER-BASED TRAINING ENVIROrJMENT<br />

STANLEY D. STEPHENSON<br />

SOUTHWEST TEXAS STATE UNIVERSITY<br />

Computer-based Training (CBT) research has typically focused<br />

on comparing a CBT course with a corresponding traditional<br />

instruction (TI) course. Compared to a similar TI course, CBT<br />

generally, but not always, produces increases in learning and<br />

retention while concurrently requiring less time than TI<br />

(Fletcher & Rockway, 1986; Goodwin et al., 1986; Kuibte;eulik,<br />

1986, 1987; McCombs, et al., 1984; O'Neil, 1986).<br />

, CBT<br />

results have not always been positive; there are many instances<br />

in which CBT did not produce increases in performance or<br />

decreases in learning time (Goodwin et al., 1986; McCombs et al.,<br />

1984). In general there has been very little research on maximizing<br />

performance within a CBT system (Gillingham & Guthrie, 1987).<br />

Conversely, there is a long history of research on variables<br />

which influence achievement in TI systems. One of the most<br />

researched variables is instructor behavior. TI research has<br />

produced a relatively high degree of consensus as to what an<br />

effective instructor does versus what a not-so-effective instructor<br />

does, with effective being defined in terms of academic<br />

achievement (Brophy, 1986; Brophy & Good, 1986; Rosenshine,<br />

'1983). Yet, CBT research has neglected the role of the instructor<br />

(Moore, 1988). Therefore, little is known about whether or<br />

not TI instructor variables transfer to CBT.<br />

In one of the few studies which did examine the role of the<br />

CBT instructor, Moore (1988) found that students who had positive<br />

teachers scored significantly higher than those in classes with<br />

negative teacher. McCombs et al (1984) reviewed various early<br />

CBT courses and found that two factors were critical to the success<br />

of the CBT courses. These were: (a) adequate opportunities<br />

for student-instructor interactions, and (b) the incorporation of<br />

group activities with individualized training. McCombs (1985)<br />

reviewed the role of the instructor in CBT from a theoretical<br />

perspective and developed several practical suggestions for<br />

instructor use. One of her suggestions was that "...instructors<br />

must have meaningful roles in the management and facilitation of<br />

active student learning, if the CBT system is to be maximally<br />

effective" (p. 164).<br />

As noted from McCombs et al (1984), student-instructor interaction<br />

was a critical factor with regard to success of a CBT system.<br />

This is a significant finding since one of the most consistently<br />

reported positive TI instructor behaviors is frequent but<br />

short student-instructor interactions; i.e., an increase in student-instructor<br />

interactions produces an increase in achievement<br />

(Brophy, 1986; Brophy & Good, 1986; Rosenshine, 1983). Therefore,<br />

a TI instructor behavior which may transfer to CBT is student-instructor<br />

interaction.<br />

The purposes of this study were two-fold. First, this study<br />

was an attempt to begin to explore the effect of instructor<br />

behavior on achievement in CBT. Second, this study specifically<br />

examined the effect of student-instructor interaction in CBT.<br />

Based on the TI instructor literature, it was hypothesized that<br />

144<br />

.


increased student-instructor interaction would produce increased<br />

achievement.<br />

Method<br />

Subjects<br />

Subjects were 25 (15 female and 10 male) college juniors and<br />

seniors enrolled in a Business Statistics class. As part of a<br />

project designed to teach students how to use computer spreadsheet<br />

software to perform statistical computations, Ss volunteered<br />

to participate in a spreadsheet tutorial for extra credit.<br />

The extra credit was awarded for project completion, not for<br />

project performance. All Ss completed a survey to assess their<br />

personal computer (PC) literacy.<br />

Experimental Materials<br />

The spreadsheet tutorial was part of a larger commercial<br />

software tutorial package designed for an integrated spreadsheetword<br />

processing-database program. The tutorial is basically linear<br />

and learner-controlled; however, Ss did have the capability<br />

to repeat a lesson if desired.<br />

For the purposes of this study, the larger tutorial was modified<br />

to include just the introduction to the integrated package<br />

plus that portion of the tutorial software devoted to the use of<br />

the spreadsheet. The introduction portion (Part A) contained<br />

four lessons, and the spreadsheet portion (Part B) contained<br />

eight lessons. The tutorials were run on Tandy 1OOOSX PCs.<br />

An exercise designed to evaluate mastery of the spreadsheet<br />

tutorial commands was added to the experimental software. Since<br />

the students were volunteers from a Business Statistics class,<br />

the exercise used simple statistical concepts as the vehicle for<br />

evaluating spreadsheet mastery. Consequently, the experimental<br />

material consisted of a CBT spreadsheet tutorial modified to<br />

include a statistics-based exercise.<br />

Procedure<br />

Ss were randomly assigned by spreadsheet/PC literacy to one<br />

of two student-instructor interaction modes. Group I (n=13) had<br />

essentially no instructor-initiated interactions. All Group I<br />

interactions were initiated by the student and consisted of<br />

requests by the students for help in overcoming an obstacle in<br />

the tutorial. Group II (n=12) experienced the same type of student-initiated<br />

interactions experienced by Group I. In addition<br />

Group II was exposed to multiple instructor-initiated interactions.<br />

Both groups worked the CBT tutorial in three sessions. In<br />

session one, all Ss started on lesson 1A and worked in the tutorial<br />

for 90 minutes. In the second session, all Ss started on<br />

lesson Bl and worked through the last lesson, B8. In the third<br />

session, all Ss started on lesson B3 and again worked through the<br />

last lesson, lesson B8. Consequently, all Ss had a single exposure<br />

to lessons Al though A4 and repeated exposure to lessons Bl<br />

through B8. Also, since each S went at his/her own speed, Ss'<br />

total time on task varied. At the completion of lesson B8 on day<br />

3, all ss were given an exercise designed to evaluate their mastery<br />

of the tutorial material. Ss had 30 minutes to work on the<br />

exercise.<br />

During the startup period of the project (i.e., the first 15<br />

minutes of the first session), the instructor responded to all<br />

145


_.. . __-..-_- ._.. ~. _ _,<br />

questions in both groups to insure that the Ss were properly<br />

logged into the tutorial. For both groups, the instructor also<br />

responded to all student-initiated interactions with one or more<br />

of three responses: (1) "Try pushing the [ESCAPE] key;" (2) "Try<br />

pushing the [SPACE] bar;" or (3) "Re-boot the system and start<br />

over." These suggestions were given in sequence; e.g., if "Try<br />

pushing the [ESCAPE] key," did not work, then the S was told to<br />

"Try pushing the [SPACE] bar." For Group I Ss, these suggestions<br />

were the only instructor interactions experienced after the first<br />

15 minutes of session one. In a sense, Group I's instructor performed<br />

an impersonal, course administrator role.<br />

In addition to the interactions listed above, Group II Ss<br />

also experienced instructor-initiated interactions. In the first ..<br />

session, the instructor initiated four interactions with each S.<br />

In session two and three, the instructor initiated three and one<br />

interactions, respectively. These interactions were related to<br />

location of keys on the Tandy keyboard. E.g., shortly before<br />

needing to use the Back Slash (\) key, the instructor would tell<br />

the students where that key was located. Key location was<br />

explained and diagrammed in instructions given to all Ss, but for<br />

most students key location on the Tandy keyboard was a minor<br />

problem due to previous exposure to an IBM keyboard. Instructorinitiated<br />

interactions lasted between 5 and 10 seconds.<br />

It should be noted that in no instance did the instructor<br />

provide information which was not available to the student elsewhere.<br />

Also, in no instance did the instructor comment, provide<br />

feedback, or give praise on the student's performance on the<br />

tutorial.<br />

Dependent Measures<br />

Two dependent measures were recorded. First, the Ss' performance<br />

on the exercise were scored. Second, Ss also recorded<br />

which spreadsheet commands they used. Since most procedures can<br />

be performed in more than one way (e.g., a cell entry can be<br />

changed via an EDIT command or by simply re-typing the entry),<br />

this second measure was recorded to assess how many different<br />

spreadsheet commands were actually used during the exercise.<br />

Results<br />

Means and standard deviations for Spreadsheet Performance and<br />

Use of Spreadsheet Commands are given in Table 1. Due to the<br />

small sample sizes (and possible problems with the assumption of<br />

normality), the Mann-Whitney U non-parametric test statistic was<br />

used to analyze differences between Group I (no instructorinitiated<br />

interaction) Ss and Group II (instructor-initiated<br />

interaction) ss.<br />

Table 1<br />

Spreadsheet Performance and Use of Spreadsheet Commands<br />

Means and Standard Deviations<br />

Spreadsheet Performance<br />

Mean SD<br />

Group I (No Interaction) 58.000 18.257<br />

Group II (Interaction) 72.417 7.403<br />

Use of Spreadsheet Commands<br />

Mean SD<br />

Group I (No Interaction) 32.308 7.250<br />

Group II (Interaction) 30.833 8.483<br />

146


Exercise Performance<br />

Group II (instructor-initiated interaction) Ss significantly<br />

out performed Group I (no instructor-initiated interaction) Ss<br />

(Mann-Whitney U = 34.50, p < .017).<br />

Use of Spreadsheet Commands<br />

There was no difference in command usage between Group I Ss<br />

and Group II Ss; (Mann-Whitney U = 82.00, p < .824).<br />

Sex Differences<br />

Sex differences were not significant (for Spreadsheet Performance,<br />

Mann-Whitney U = 56.00, p < .289; for Use of Spreadsheet<br />

Commands, Mann-Whitney U = 69.50, p < .755).<br />

Discussion<br />

The hypothesis that increased student-instructor interaction<br />

would lead to increased achievement was supported. Given the<br />

limited length of the CBT program used in this experiment, the<br />

degree of difference of increased achievement between the two<br />

groups was surprising. For some reason, having the instructor<br />

interact with/take notice of/care about the student affected the<br />

student to the point where it increased his/her achievement.<br />

The underlying cause for the difference in achievement did not<br />

seem to be knowledge. All Ss seemed to "learn" the commands presented<br />

in the tutorial; there was no difference between groups in<br />

the number of commands used to solve the exercise. The difference<br />

was in how well the commands were used.<br />

Nor was the difference in achievement due to praise or<br />

feedback. Neither group received praise for their performance.<br />

Unless relatively brief human interaction is defined as praise,<br />

praise was not a factor in this study. Extra credit for higher<br />

performance on the exercise also was not a factor; all Ss<br />

received the same amount of extra credit regardless of their performance.<br />

A clue as to why Group I Ss did not perform as well as Group<br />

II Ss comes from observations made by the Group instructor. It<br />

seemed that Group I Ss used the space bar more frequently than<br />

did Group II Ss. In this study's tutorial, Ss had the capability<br />

to literally space-bar their way through the tutorial. I.e.,<br />

rather than actually performing the requested tutorial action, Ss<br />

could depress the space-bar and step through the program.<br />

Although not measured, Group I Ss seemed to take this approach<br />

more frequently. Consequently, while both groups were equally<br />

exposed to the material, Group II Ss seem to actually perform the<br />

tutorial more. This -space-bar' behavior could account for the<br />

difference in achievement. The difference in standard deviation<br />

between the two groups could also be a result of the reduced<br />

practice by Group I Ss.<br />

If the explanation offered above is accurate, it suggests<br />

that brief human interaction serves to keep students on task more<br />

so than no human interaction. If no one is aware of what I am<br />

doing, I am more likely to try to ease my way through the CBT<br />

course. However, if someone is aware of what I am doing, irrespective<br />

of whether or not that someone gives me praise or feedback,<br />

then I had best stay on task.<br />

Due to the manner in which the Group II interactions<br />

occurred, instructor monitoring of the students was confounded<br />

with interaction. For the instructor to know when to interact<br />

147<br />

. .<br />

f


--.--- ---<br />

. -. ._ . .-. ._. __-_ _.<br />

with an appropriate comment, the instructor had to know when a<br />

student was approaching a particular point in the tutorial. In<br />

order to know this, the instructor had to constantly monitor the<br />

students' progress. Consequently, while the Group I instructor<br />

sat at a desk and waited for students to request assistance, the<br />

Group II instructor was constantly walking around the room and<br />

visually checking on where Ss were in the tutorial. Therefore,<br />

it may be that monitoring, and not interaction, was the basis for<br />

Group II's higher achievement.<br />

These results add to the results reported by Moore (1988) who<br />

found that, even in CBT, teachers with positive attitudes produced<br />

higher achievement than teachers with negative attitudes.<br />

Evidently, instructor interaction can also affect achievement.‘<br />

Whether or not the interaction needs to be tied to course content<br />

is unknown. In this study, the instructor's comments were not<br />

content-based. Therefore', it may be CBT instructors should<br />

interact with students in order to maximize achievement, but that<br />

the interactions may not need to be related to the material being<br />

covered.<br />

Implications<br />

The relatively short-term nature of the tutorial used in this<br />

experiment obviously limits the generalization of this study's<br />

results. That limitation not withstanding, the specific conclusion<br />

from this study is that brief instructor-initiated interactions<br />

can increase achievement in CBT. However, instructor monitoring<br />

without interaction may produce the same result.<br />

Since the role of the instructor in CBT is frequently undefined,<br />

the results from this study give some direction as to what<br />

a CBT instructor can do to influence achievement. Moreover,<br />

since instructor-initiated interactions are controlled by the<br />

instructor, these interactions can be both built into the larger<br />

learning system (which includes the CBT subsystem) and also<br />

included in the instructor evaluation system.<br />

A larger implication from this study is that instructor<br />

behavior does seem to influence achievement in CBT. The results<br />

obviously support Moore's research (1988) and McCombs (1985) suggestions.<br />

There is simply something about having another human<br />

around and aware of your actions that alters your behavior. Even<br />

in the best designed, best built, and best implemented CBT systems,<br />

instructor behavior may still influence achievement.<br />

Rather than trying to design a CBT system which does away with<br />

the instructor (or to design a system which essentially ignores<br />

the instructor), CBT developers should try to find ways in which<br />

to use instructor presence to maximize achievement.<br />

References<br />

Brophy, J. E. (1986). Teacher influences on student achievement.<br />

American Psycholoqist, October, 1069-1077.<br />

Brophy, J. E. & Good, T. L. (1986). Teacher behavior and<br />

student achievement.- In M. c. Wittrock (Ed.), Third Handbook<br />

of research on teaching: 328-375. New York: Macmillian.<br />

Fletcher, J. D., & Rockway, M. R. (1986). computer-based educa-<br />

tion in the military. In J. A. Ellis (Ed.), <strong>Military</strong> contribu-<br />

tions to instructional technoloqy<br />

Gillingham, M. G., & Guthrie, J. 1.<br />

148<br />

New York: Praeger.<br />

(1987). Relationships<br />

. .


etween CBT and research on teaching. Contemporary Educational<br />

Psycholoqy 12, 189-199.<br />

Goodwin, L. D., Ghodwin, W. L., Nansel, A., & Helms, C. P.<br />

(1986). Cognitive and affective effects of various types of<br />

microcomputer use by preschoolers. American Educational<br />

Research Journal, 23, 348-356.<br />

Kulik, C. C., & Kulik, J. A. (1986). Effectiveness of computerbased<br />

education in colleges. AEDS Journal, Winter/Spring,<br />

81-108.<br />

Kulik, J. A., & Kulik, C. C. (1987). Review of recent<br />

research literature on computer-based instruction. Contemporary<br />

Educational Psycholoqy 12, 222-230.<br />

McCombs, B. L. (1985). Instruhtor and group process roles in .<br />

computer-based training. Educational Communication and Technology<br />

Journal, 33, 159-167.<br />

McCombs, B. L., Back, S. M., & West, A. S. (1984). Self-paced<br />

instruction: Factors critical to implementation in Air Force<br />

technical training - A preliminary inquiry. (AFHRL-TP-84-23).<br />

Lowery Air Force, Base, CO: Air Force Human Resources Laboratory,<br />

Training Systems Division.<br />

Moore, B..M. (1988). Achievement in basic math skills for low<br />

performing students: A study of teachers' affect and CAI. -The<br />

Journal of Experimental Education, 5, 38-44.<br />

O'Neil, H. F., Anderson, C. L., & Freeman, J. A. (1986).<br />

Research in teachina in the Armed Forces. In M. C. Wittrock<br />

(Ed.), Third handbook of research on teaching: 971-987. New<br />

York: Macmillian.'<br />

Rosenshine, B. (1983). Teaching functions in instructional<br />

programs. The Elementary School Journal, 83, 335-351.<br />

149


Transfer of Training with Networked Simulators'<br />

David W. Bessemer<br />

U.S. Army Research Institute<br />

Field Unit-Fort Knox, Fort Knox, Kentucky<br />

The Armor Officer Basic (AOB) Course in the Fort Knox Armor<br />

School includes three weeks of tactical instruction followed by<br />

ten days of Mounted Tactical Training (MTT) in the field. During<br />

MTT, students rotate among tank crew and unit leader positions as<br />

they perform platoon mission exercises. Late in 1988, two days<br />

of similar training in networked tank simulators were added just<br />

before the MTT. Additional platoon movement training using<br />

wheeled vehicles also began with the next class after simulator<br />

training started. These changes set up a quasi-experimental<br />

comparison between basel.ine classes that graduated before the<br />

changes and later classes that received added training. Student<br />

records provided performance measures in an interrupted timeseries<br />

design (Cook & Campbell, 1979) that permitted transfer<br />

from simulator training to field performance to be assessed.<br />

The simulator networking (SIMNET) system used for the AOB<br />

training was produced as a test-bed for Defense Advanced Research<br />

Projects Agency R &I D on technologies capable of large-scale<br />

interactive simulation of land combat. Training devices using<br />

these technologies could provide increased collective training<br />

for units, while reducing factors such as cost, time, and maneuver<br />

space that now restrict combined arms training. However,<br />

unit training in simulators must be shown to be effective to<br />

justify further development and acquisition of networked simulator<br />

training devices. Evidence supporting the effectiveness of<br />

SIMNET training for some platoon tasks was obtained in a test<br />

with a small number of units (Gound 61 Schwab, 1988). Results<br />

reported here supplement the previous findings by specifically<br />

examining officer training for platoon leadership. An important<br />

issue in interpreting the results was whether SIMNET training<br />

caused the observed effects or if other factors contributed, such<br />

as the added wheeled vehicle training.<br />

Samole<br />

Method<br />

One group of 1098 students were enrolled in 24 AOB classes<br />

graduating in a 68 week baseline period. Another group of 607<br />

students were from 12 later classes in a 33 week period after<br />

tactical training in SIMNET was added to the course. There were<br />

one to five student platoons in a class, adding up to 71 platoons<br />

'The views , opinions, and findings contained in this paper<br />

are those of the author, and should not be construed as the<br />

official position of the U.S. Army Research Institute or as an<br />

official Department of the Army position, policy, or decision.<br />

--.<br />

150<br />

_,


in baseline classes and 39 platoons in SIMNET-trained classes.<br />

Platoons were supervised by a group of 16 officer and senior<br />

noncommissioned officer (NCO) instructors, called Team Chiefs,<br />

each assisted by a team of NC0 tank crew instructors. Every<br />

platoon had one Team Chief guiding all of its tactical training.<br />

Eauioment<br />

SIMNET Traininq. Training was conducted in the Combined<br />

Arms Tactical Training Center (CATTC) at Fort Knox that houses<br />

the SIMNET system. AOB classes used four Ml tank modules per<br />

platoon with a terrain data base portraying the Fort Knox areas<br />

used for AOB,field training. Vehicle crews operate SIMNET<br />

modules interactively through a local area computer network (LAN)<br />

in a manner similar to real vehicles. Scenes shown in simulated<br />

sights and vision blocks respond to control inputs to create the<br />

illusion of moving and fighting on the battlefield. Crews can<br />

detect and shoot enemy vehicles, and communicate both within the<br />

crew and to other vehicles and organizations. Operating together I<br />

as a unit, crews can use many standard tactical techniques to<br />

execute a combat mission.<br />

Field Traininq. Each AOB student crew in SIMNET-trained<br />

classes (except for the first such class) used High Mobility<br />

Multi-Purpose Wheeled Vehicles (HMMWVs) for some MTT-like preparatory<br />

training on cavalry operations. All student crews used an<br />

M60A3 tank (U.S. Department of the Army, 1979) and basic issue<br />

items furnished with the tank during MTT.<br />

Trainina Procedure<br />

SIMNET Exercises. In the first day of simulator training,<br />

the students were introduced to the operation of SIMNET tank<br />

modules, and conducted a tactical road march mission as a tank<br />

company. Platoons.then practiced techniques of movement and<br />

battle drills, and performed a movement to contact mission<br />

against static unreactive target vehicles placed on the terrain.<br />

Two force-on-force (FOF) exercises were completed on the next<br />

day, with pairs of platoons alternating in offensive and defensive<br />

roles. For every exercise, the platoon Team Chief selected<br />

two students to act as platoon leader and platoon sergeant. The<br />

Team Chief gave these students a company-level mission order, and<br />

allowed them about an hour to plan and prepare the platoon<br />

mission. The Team Chief controlled the execution of the mission<br />

by acting in the role of company commander. After an exercise,<br />

the Team Chief led an after-action review (AAR) in which the<br />

platoon assembled to discuss strong and weak points exhibited in<br />

planning and executing the mission. After a FOF exercise, the<br />

opposing platoons met for a joint AAP.<br />

Field Exercises. Students platoons completed from two to<br />

four on-tank exercises per day during MTT. For several days the<br />

exercises were relatively elementary, gradually increasing in<br />

complexity and difficulty. Initially, the exercises consisted of<br />

151


oad marches and unopposed cross-country movement. Then there<br />

were several movement to contact and other simple offensive<br />

missions with light simulated enemy contact. Defensive exercises<br />

began the relatively advanced level of training. Complex offense<br />

and defense mission exercises were intermingled in the later days<br />

of the MTT. The student platoons were in the field continuously<br />

during the lo-day MTT training period. The students' positions<br />

in crews were rotated frequently, and new individuals were chosen<br />

to serve as platoon leader, platoon sergeant, and TCs after most<br />

exercises. Usually each student served once in both the platoon<br />

leader and platoon sergeant positions during the MTT in either<br />

order. The sequence of events in field exercises was like that<br />

used in SIMNET. The Team Chief gave a company mission order, and<br />

then the leader and sergeant planned, prepared, and executed the<br />

platoon mission under the command of the Team Chief.<br />

Measures<br />

Crew instructors rated performance of students acting as<br />

platoon leaders or platoon sergeants in the field exercises, with<br />

final review and approval of the ratings by the Team Chief.<br />

Elements of planning, movement and control, and conduct of the<br />

operation were rated on a three-point scale. More than 80% of<br />

the ratings were in the middle (average or satisfactory) category,<br />

showing a strong central bias. Ratings coded as 1, 0, and -1<br />

were averaged for 17 items, omitting items judged "not applicable"<br />

in a particular exercise, to form a field performance index<br />

ranging between +lOO, with zero set at the middle scale category.<br />

The number of ratings was also used to indicate the relative<br />

number of field exercises completed in a platoon. The number of<br />

ratings roughly corresponds to twice the number of exercises.<br />

Separate counts were made for the categories of elementary<br />

movement and contact exercises, and for advanced exercises.<br />

At course graduation, the crew instructors evaluated overall<br />

leadership qualities exhibited by the students during the platoon<br />

tactics phase of the AOB course. Team Chiefs also reviewed and<br />

approved these Comprehensive Student Evaluation ratings. The<br />

ratings showed a strong ceiling effect, with over 90% of the<br />

ratings judged in the highest category (*'yes,@@ indicating the<br />

student possessed the rated quality) on a three-point scale. The<br />

platoon average percentage of lo items given the llyesll rating was<br />

used as a measure of graduate quality. The inverse sine transformation<br />

was applied to the percentages before analysis.<br />

Statistical Analyses<br />

A quasi-experimental comparison of time trends between the<br />

baseline and SIMNET-trained groups was used to assess transfer<br />

effects from the added training. The date of graduation for each<br />

Class was the main independent variable in regression analyses.<br />

The effects of primary interest were changes in intercept and<br />

slope Of the trend over time from those shown by the baseline<br />

152<br />

-'


--. ..____ s<br />

platoons. Team Chiefs, coded as dummy control variables, were<br />

used to partial out differences in platoon averages associated<br />

with instructor teams. Other variables in the analysis of field<br />

exercise ratings were leader position and day of MTT. Effects of<br />

these variables were found to be independent of the time trends,<br />

and are not presented here. See Bessemer (In publication) for<br />

further details on the analyses and statistical results.<br />

Results<br />

In baseline AOB classes, the number of movement evaluations,<br />

but not contact evaluations declined for elementary exercises, as, _.<br />

Figures 1 and 2 show. The total elementary evaluations in Figure<br />

3 combine these categories. The baseline change reflects efforts<br />

made to conserve training resources. Contact evaluations were<br />

reduced further in classes with SIMNET AND HMMWV training. In<br />

contrast, for baseline classes in Figure 4, evaluations counted<br />

in advanced exercise showed no trend. These evaluations then<br />

increased in number after the added training began. Thus, SIMNET<br />

and/or HMMW training produced some immediate savings in the<br />

amount of elementary MTT training, which then was replaced by<br />

more advanced training exercises in the later AOB classes.<br />

The effect for field ratings shown in Figure 5 was like that<br />

for advanced evaluations. Average student ratings across classes<br />

gradually increased after the SIMNET and HMMW training was added<br />

to the AOB course, indicating positive transfer to performance in<br />

the student's initial MTT exercise emerged in later classes.<br />

For the graduate quality measure, the best-fitting trends<br />

shown in Figure 6 were not quite significant. Results for the<br />

first SIMNET-trained class are aberrant owing to a change in the<br />

wording of the rating scale in the next class. Omitting the<br />

first class after the baseline, a rank-sum test showed that<br />

graduate quality increased significantly in later classes.<br />

Discussion<br />

The tactical training added to the AOB Course was associated<br />

with three major effects. First, elementary contact exercises<br />

conducted in the MTT decreased in number, and were gradually<br />

replaced by additional advanced exercises involving defense and<br />

offense missions. Second, positive transfer in terms of improved<br />

field exercise performance in the MTT emerged gradually after the<br />

pre-MTT training was expanded by SIMNET training and HMMWV field<br />

exercises. Third, there were indications that the transfer<br />

effect persisted to enhance the judged quality of AOB graduates,<br />

at least for the last classes examined. Careful consideration of<br />

several possible confounding factors led to the conclusion (see<br />

Bessemer, In publication) that SIMNET training, rather than HMMWV<br />

training, was largely responsible for the observed transfer<br />

effects. The gradual emergence of these effects over an extended<br />

time was interpreted as reflecting the accumulation of instructor<br />

experience in using SIMNET to train platoon tactics.<br />

153


1 .<br />

0 "."',"<br />

0 lo 20 30 40<br />

.<br />

I � ��� �� �<br />

,<br />

A<br />

3<br />

SIMNIT TRAINING<br />

.<br />

� �<br />

WEEK FROM 1 JANUARY 1988<br />

�<br />

.<br />

* .<br />

��<br />

��� �<br />

� �<br />

Figure 1. Adjusted number of<br />

performance evaluations per<br />

platoon for AOB students in<br />

movement exercises during MTT.<br />

20 -<br />

r _<br />

0 I= ‘8 .<br />

2 16 12 14- - - -<br />

8 1D 3.<br />

6<br />

m B-<br />

$ b-<br />

+-<br />

2-<br />

BASELINE<br />

,::: i :<br />

.<br />

.<br />

TRAINING<br />

.<br />

.<br />

.<br />

SIMNET TRAINING<br />

:;<br />

0 IO 20 34 40 50 60 70<br />

WEEK FROM I .bNUARY 1988 WEEK FROM 1 JANUARY 1968<br />

Figure 3. Adjusted number of<br />

performance evaluations per<br />

platoon for AOB students in<br />

elementary MTT exercises.<br />

u<br />

u<br />

12<br />

BASELINE TRAINING SIMNfl- TRAINING<br />

01 8 I I I I I<br />

0 10 20 Jo 40 50 50 70 bD<br />

WEEK FROH 1 JANUARY 1988<br />

Figure 2. Adjusted number of<br />

performance evaluations per<br />

platoon for AOB students in<br />

contact exercises during MTT.<br />

4B<br />

44<br />

BASELINE TRAINING SIMNET TRAINING<br />

Figure 4. Adjusted number of<br />

performance evaluations per<br />

platoon for AOB students in<br />

advanced MTT exercises.<br />

This evidence for positive transfer helps the Army to show<br />

that its investment in networked simulation devices has value for<br />

officer school training. More importantly, these findings have<br />

significant general implications for how the Army conducts device<br />

training effectiveness tests, and how it uses devices. The value<br />

of training devices may be seriously underestimated in tests if<br />

trainers are not allowed sufficiently extended experience to<br />

learn how to train effectively using the device. Instructors in<br />

many tests have only been taught to operate the device, and have<br />

trained few soldiers on the device before training the test<br />

154<br />

.<br />

.


._<br />

50 - BASELINE TRAINING<br />

z .<br />

WEEK FROM 1 JANUARY 1988<br />

SlMNElT TRAINING I<br />

Figure 5. Adjusted mean<br />

performance rating by platoon<br />

for AOB students in their first<br />

exercise rated during MTT.<br />

Rating limits are +lOO.<br />

��� � �������� �������� SlMNEl<br />

TWINING<br />

01 fi 3 ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ *‘I ’ ’ ’ ’ ’<br />

-20 -lo 0 10 20 JO 40 M 60 70 00<br />

WEEK FROM 1 JANUU?Y 1988<br />

Figure 6. Adjusted mean arcsine<br />

percentage of items rated "yesI'<br />

on the Comprehensive Student<br />

Evaluation for AOB platoons.<br />

Angle limits are fn/2.<br />

sample. Quasi-experimental test designs can help overcome this<br />

problem, as well as limited statistical power imposed by small<br />

sample size. Many military training exercises are performed<br />

repeatedly by units in an annual training cycle. Collection of<br />

training records and appropriate performance measures can provide<br />

a large sample of baseline data to compare with results achieved<br />

with new training devices.<br />

The full benefits of training will not be obtained from<br />

fielded devices without consistently giving every trainer adequate<br />

experience to learn how to train most effectively. Turnover<br />

in unit trainers, and infrequent device use are factors that<br />

work against keeping instructor experience at a high level, and<br />

reduce the potential effectiveness of device training.<br />

References<br />

Bessemer, D. W. (In publication). Transfer of SIMNET trainins in<br />

the Armor Officer Basic Course (AR1 Technical Report). Alexandria,<br />

VA: U.S. Army Research Institute for the Behavioral<br />

and Social Sciences.<br />

Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation:<br />

Desian and analvsis issues for field settinos. Chicago: Rand-<br />

McNally.<br />

Gound, D., t Schwab, J. (1988, March). Concert evaluation<br />

proaram of Simulation Networkina (SIMNET) (Final Report,<br />

TRADOC TRMS No. 86-CEP-0345). Fort Knox, KY: U.S. Army and<br />

Engineer Board. (Available from Commander, U.S. Army Armor<br />

Center and Fort Knox, ATTN: ATZK-DS, Fort Knox, KY 40121-5180)<br />

155<br />

. .


CONTINGENCY TASK ‘IRAiNING SCENARIO GENERATOR<br />

1Lt Todd S. Dart<br />

2Lt Jody A. Guthals<br />

Ma? Timothy M. Bergquist<br />

Air Force Human Resources Laboratory<br />

INTRODUCTION<br />

The Contingency Task Training (CTT) project was directed at determining<br />

critical skills necessary in wartime or during mid to low-intensity conflicts.<br />

Subsequently , this knowledge would be used for training. The Air Force Human _.<br />

Resources Laboratory (AFHRL) was tasked with developing the methodology at the<br />

request of Headquarters Air Training Command (HQ ATC) and the U.S. Air Force<br />

Occupational Measurement Center (USAFOMC). The concept for CTT originated in a<br />

study performed by CSAFOMC in 1979, entitled the Air Base Ground Defense Tactics<br />

Analysis. A task survey for security police (SP) personnel was combined with a<br />

simple scenario in order to determine which tasks are more important during a<br />

given situation. The study was highly effective in restructuring the S? field,<br />

so much so that HQ ATC requested the technology be developed for combining task<br />

surveys w: th contingency scenarios. USAFOMC in turn produced Request For<br />

Personnel Research (RPR) 84-02, ‘Contingency Task Training Requirements.’ asking<br />

AFHRL to develop and validate the contingency technology.<br />

AFHRL started the CTT project in 1988. In order to develop, test, and<br />

validate scenarios for use with task surveys, the project was divided into two<br />

phases. Phase I was the development of the scenario generation technology.<br />

Phase II involved coupling scenarios to task surveys. The scenario and task<br />

survey would then be sent to senior noncommissioned officers (NCOs) w!lo would<br />

review the scenario and rate each task listed for their respective jobs as to<br />

training emphasis. The results would then be validated against Specialty<br />

Training Standards (STS) which list skills each airman is to be instructed in tc<br />

reach certain levels of proficiency. Some of the skills in the STS are marked<br />

with an asterisk signifying tasks to be taught during wartime. All other tasks<br />

not ma 1; 2. e 2 are to be dropped from instruction. The method of choosing xhich<br />

tasks to mark has always been left up to the senior NCO. In the past, marking<br />

wartime 5kills has been done at the last minute during course re-evaluation.<br />

Also, mart:+? .T~?‘!!s have never been validated.<br />

T::i: purpose of the CTT project was to provide a method to validate wartime<br />

skills. AFHRL undertook the task of creating scenario generation technology and<br />

subsequent validation via task surveys. AFHRL has now completed Phase I of the<br />

project (Dart, 19901 . The Phase II task survey will be performed by USAFOMC.<br />

CRITERIA<br />

HO ATC and USAFOMC were consulted to determine exactly what the scenario<br />

generator should comprise. Initial research indicated a need for a scenario<br />

generator able to generate both natural disaster scenarios and conflict/wartime<br />

scenar i 0s. Later, the focus changed to conflict/wartime scenarios only. A<br />

disaster scenario generator used in developing training for a disaster situation<br />

occurring infrequently in a small region would not be cost effective. The<br />

technology should concentrate on the mission of the Air Force, national defense,<br />

and the implementation of U.S. Armed Forces as part of national policy.<br />

156


The scenario must be short, concise and realistic. A poorly written<br />

credible scenario is better than a well written unbelievable one. According to<br />

experts in scenario generation, a scenario shouid provide the minimum amount of<br />

information to describe the situation (deLeon, 19731. People can only absorb a<br />

finite amount of data, and fine detail may distract the reader from the overal 1<br />

intent of the scenario. Only critical scenario variables had to be selected.<br />

The scenario descriptions are intended ior use with task surveys; hence, they<br />

must ‘paint’ a conflict situation with application to all Air Force Specialties<br />

(AFSsl .<br />

Consideration of the user group is also important. The CTT scenario<br />

generator is intended for use by people inexperienced with creating a scenario.<br />

Additionally, the primary user group, USAFOMC, is relatively small.<br />

To ensure necessary scenario generation guidelines are followed, and ..<br />

because of user inexperience, the scenario generator should be automated. The<br />

optimal design, with users in mind, would be a small program operable on DOS<br />

compatible microcomputers. The inclusion of an on-line help which would provide<br />

definitions to all conting;ncy variables was also deemed important.<br />

APPROACH<br />

Initial Research Existing scenario generators were investigated prior to<br />

any development. Typically, scenario generators are used in war games. They<br />

mainly deal with overall battle management as opposed to individuals,<br />

Therefore, the standard scenario generator used for combat tactics was of little<br />

to no use for CTT.<br />

A preliminary scenario design system was being developed by the U.S. Army<br />

for training intelligence gathering skills to intelligence officers. The Low<br />

Intensity Conflict Study Group of the U.S. Army Intelligence Center and School<br />

at Fort Huachuca, AZ developed a non automated scenario generator for creating<br />

low-intensity conflict (LIC) scenarios (Smiley, 19891. Since the material<br />

suited the needs of the CTT project, permission was obtained to use the<br />

variables in the CTT scenario generator.<br />

The Army’s material was appropriate for use in a LIC scenario. Future<br />

warfare is forcasted to be primarily in the LIC arena, but may also include<br />

‘normal ’ or high-intensity conflicts and mid-intensity conflicts such as<br />

Vietnam. The CTT scenario generator enhanced the Fort Huachuca version to<br />

include variables pertinent to all levels of combat. Also, certain definitions<br />

and variables were modified to apply directly to the Air Force and its mission.<br />

The different levels of conflict intensity provided structure for further<br />

Scenario generator development. Definitions of high, mid, and low-intensity<br />

conflicts were extracted from Army FM 100-20, and are listed in Table 1.<br />

Numerous other sources also provided input into the scenario generator<br />

design. Work done by AFHRL Logistics and Human Factors Division (LR) , called<br />

the Combat Maintenance Capability, provided information on collecting<br />

contingency skills information (Dunigan et al, 19851. They had developed a<br />

methodology to determine wartime maintenance tasks. Maintenance specialists<br />

we,re asked to indicate work unit codes (WUCl used to indicate repairs performed<br />

on aircraft. The scenario was set at Hahn AB, Germany during a Warsaw Pact<br />

offensive. Their study provided information helpful to contingency scenario<br />

design and task data collection. The Combat Maintenance Capability study<br />

evaluated several computer models, the most notable being the Logistics<br />

Composite Model (LCOMl, the Theater Simulation of Airbase Resources (TSAR), and<br />

the Theater Simulation of Airbase Resources inputs using AIDA (TSARINA).<br />

157


_1~,-.._I_ ---.. _. _ _.-.<br />

HIQH INTENSITY CONFLICT is war between two or more nations and their<br />

respective allies, if any, in which the belligerents employ the most modern<br />

technology and all resources in intelligence; mobility; firepower (including<br />

nuciear, chemical, and biological weapons 1 ; command, control, and<br />

communications: and service support.<br />

MID-INTENSITY CONFLICT is war between two or more nations and their<br />

respective allies, if any, in which belligerents employ the most modern<br />

technology and all resources in intelligence; mobility; firepower (excluding<br />

nuclear, chemical, and biological weapons1 ; command, control, and<br />

communications; and service support for limited objective under definitive<br />

policy limitations as to the extent of destructive power that can be<br />

employed or the extent of geographic area that might be involved.<br />

LOW INTENSITY CONFLICT is internal defense and development assistance<br />

operations involving actions by U.S. combat forces to establish, regain, or<br />

maintain control of specific land areas threatened by guerrilla warfare,<br />

revolution, subversion, or other tactics aimed at internal seizure of power,<br />

Table 1. CONFLICT DEFINITIONS<br />

Additional information on LIC scenarios came from the Army-Air Force Center<br />

For Low Intensity Conflict (CLIC) and the Joint Warfare Center.<br />

Work to determine medical wartime tasks is being done by the Medicai<br />

Wartime Hospital Integration Office (MWHIO) at Fort Detrick, MD. The project,<br />

titled WARMED, is designed to determine the critical wartime skills needed by<br />

medical personnel (Meinders, 19871. Concerns by V!ARM!D directors that the CTT<br />

project would overlap their own results and recommendations led AFHHL to avoid<br />

the medical field entirely in the scenario generator design.<br />

Other information sources included Air *Training Command Office of Wartime<br />

Plans (ATC/DPXl and the Headquarters Air Force Management Engineering Agency<br />

(HQ AFMEA) , Wartime Manpower Division. HQ AFMEA was concerned about common<br />

tasks, those critical for a wartime situation yet performed by all AFSs . For<br />

example, personnel in any specialty should know the tasks required for the<br />

donning of protective chemical gear. Most common tasks are survival skills in<br />

which everyone should be trained. The Air Force, while training some common<br />

tasks, does not have an active program of ensuring cannon tasks are learned and<br />

maintained by all personnel. The Army does have such a program and routinely<br />

tests all soldiers’ skills listed in a series of pamphlets appropriately<br />

entitled the Soldier’s Manual of Common Tasks (STP 21-l-SMCT, 1987). 7 h e<br />

concept of common tasks and peacetime tasks is best illustrated in Figure 1.<br />

Preliminary Design The result of the literature review and consultations<br />

was a manual scenario generator consisting of several categories with pertinent<br />

variables. A variable dictionary was also c,reated to aid in choosing the<br />

correct variables for a scenario. Variables not found in the Army’s LIC<br />

generator but included in the CTT scenario generator are factors describing the<br />

environment in more detail. Choice of variables was based on deleon’s w:rk<br />

which recommended appropriate material to include in any scenario.<br />

Once a written version of the scenario generator was developed, t 1: e<br />

necessity and later feasibility of an automated process became apparent. A<br />

computer program, designed to create the scenario, greatly enhances<br />

and consistency of scenario generation.<br />

the . sreed .<br />

158


I<br />

PEACiTIME<br />

TAIKI<br />

I<br />

,......_.,,....,., . . . . . ,..-- _ ,-(<br />

1<br />

CONTINOENCY<br />

TASK8<br />

LOW MID HIOH<br />

INTENBITY CONFLICT INTENSITY CONFLICT INTENIITY CONFLICT<br />

CONVENTIONAL OONVENTIONAL<br />

-<br />

t<br />

efOL0QlcAL<br />

CHEMICAL<br />

Figure 1. AFS Task Relationship8<br />

NUCLEAR<br />

BIOLOQICAL<br />

CHEMICAL<br />

Scenario Generator Pascal was selected as the programming language since it<br />

provided the necessary versatility and relative ease of use. The program was<br />

written with the use of Turbo Pascal version 4.0 and is very simple to use. T h e<br />

program operation speed increases when it is loaded onto a hard drive or RAM i<br />

drive. It will run on any IBM (DOS) compatible microcomputer with 512X RAM. A i<br />

color monitor is recommended but not required. The program consists of 2 files 1<br />

and is easily contained on a single 360X floppy disk. ;<br />

The main program, CTT.EXE, is a menu driven program. It presents all the<br />

variable categories developed for the manual version. In a step-by-step process<br />

beginning with the selection of intensity level, variables categories are<br />

presented with the specific variables listed for user selection. Table 2<br />

provides a listing and description of the variable categories in the. program.<br />

There are nine categories possible for High Intensity conflict and eight for rr,:d<br />

intensity. Low.intensity conflict has twelve categories, four more than for mid<br />

intensity, due to the complex nature of LICs.<br />

Variables are entered one at a time and stored in the computer until the<br />

last variable is selected, whereupon all variables are placed in a standard<br />

scenario format. In most cases, the variables are listed in a sentence format<br />

with no additional information. However, in a few of the variables additional<br />

information is drawn from a small ‘library’ within the program and displayed in<br />

the final scenario. This feature serves to enhance the quality of the scenario<br />

produced. Time constraints prevented displaying additional information for all<br />

variables, although such an improvement is recommended in future development.<br />

The second file, CTT-REV.HLP, contains variable definitions. This on-line<br />

‘help’ function is a useful feature of the program. When accessed, it provides<br />

a complete definition of all variables listed in the scenario generator.<br />

Supplementary information for many of the variables is also. This file must be<br />

loaded on the same disk with the scenario generator program to be accessed.<br />

The program will allow the user to generate low, medium and high-intensity<br />

conflicts. Special emphasis is given to low intensity conflicts as they are<br />

typically the most intricate. Interestingly, the low-intensity conflict is<br />

rapidly becoming the most common conflict the U.S. will face in coming years.<br />

159<br />

.<br />

,<br />

/


-Conflict Intenelty: choice of the degree of conflict intensity.<br />

-Highest NBC Threat Level (High Intensity Only): description of the<br />

amount of nuclear, biological, and chem:cal clothing worn.<br />

-Attrition: the amount of critical personnel and equipment<br />

damaged/wounded and destroyed/killed per month.<br />

-Logistics: the amount of critical supplies required to perform the<br />

mission which actually reach the combat area<br />

-Environment<br />

-- Area and Size: the size or area of operations.<br />

-- Sub-Terrain: choice of three areas of man-made environments.<br />

-- Terrain: choice of terrain combined with season presents a<br />

detailed climatic description.<br />

-- Season: choice of four seasons<br />

-Mission Duration: the length of time the scenario will last.<br />

-Command And Control: presents choices for particular commands or joint<br />

operation.<br />

-Low Intensity Conflict Only<br />

-- LIC Situations: eight of the most common types of situations.<br />

-- Operational Category: describes the general intent of the<br />

military operation undertaken to el‘tt;er combat or facilitate<br />

the LIC zitcation.<br />

-- Threat Type: type of threat US forces are expected to face<br />

during the LIC situation.<br />

-- Threat Support: the type of popu!ar support the threat will<br />

receive.<br />

Table 2. VARIABLE CATEGORIES<br />

The final scenarios produced by the program are short and simple as was<br />

specified by experts in scenario design. The user has options to print the<br />

scenario or send it to a computer disk file. If copied to a disk the scenario<br />

can then be modified using any word processor program that reads ASCII.<br />

The Contingency Scenario Generator User’s Manual (Dart & Guthals, 1990)<br />

provides additional information on the variables and the use of the program.<br />

VALIDATIOR<br />

Evaluation involved conducting what was termed a ‘reality check’. To<br />

perform the reality check, the program was taken to several wartime planning<br />

offices.<br />

The HQ ATC Technical Training (HQ ATC/TTIR?) division, HQ AFMEA. and the<br />

School of Aerospace Medicine, Battlefield Readiness (USAFSAM/EDO) office, Brooks<br />

AFB, were asked to review the program and provide input into its improvement.<br />

In addition to the above mentioned sources for scenario evaluation, other<br />

sources were contacted concerning specific aspects of the generator. ,Msa t<br />

notable was the value for attrition given in the scenario. The Air Force<br />

Wartime Manpower, Personnel and Readiness Team !AFW!ZRTl at Fort Riche, ,MD<br />

provided valuable information in this regard.<br />

The evaluation of the scenario generator by war planning experts led to<br />

several recommendations for further development. Those that were easy and<br />

straight-forward to implement in the time available were incorporated into the<br />

Scenario generator. Unfortunately, to implel;ent several recommendations wculd<br />

160


have involved complicated procedures or a major reprogramming of the generator,<br />

Therefore, although they would enhance the generator, those recommendations were<br />

not implemented.<br />

CONCLUSION<br />

During the CTT project a methodology was developed to design contingency<br />

scenarios. They can be used with task surveys to identify wartime tasks and<br />

subsequently, the needed training requirements. The project makes use of the<br />

latest information in scenario design and variable definition from both the Air<br />

Force and the Army.<br />

Phase I of the CTT project has been completed. The CTT scenario ’ generator-.<br />

has proven to be successful in its attempt to provide a suitable contingency<br />

scenario. In fact, while the program was originally designed for use with task<br />

surveys at USAFOMC, it has already been adopted by USAFSAM/EDO for designing<br />

scenarios for contingency*instruction of medical officers.<br />

Phase II of the CTT project, determining wartime skills through task<br />

surveys, would be undertaken and completed as appropriate USAFOMC.<br />

REFERENCES<br />

Army Field Manual 100-20 (19881. Low Intensity Conflict. Washington D.C.:<br />

Department of the Army.<br />

Dart, T. S., & Guthals, J. A. (19901. Contingency Scenario Generator User’s<br />

Manua 1 (AFHRL-TP-90-741. Brooks AFB, TX: Manpower and Personnel<br />

Division, Air Force Human Resources Laboratory.<br />

Dart, T. S. (19901. Contingency Task Training (AFHRL-SR-90-731. Brooks AFB,<br />

TX: Manpower and Personnel Division, Air Force Human Resources Laboratory.<br />

deLeon, P. (19731. Scenario Designs: & Overview (R-1218-ARPA). Santa<br />

Monica, CA: Rand Corporation.<br />

Dunigan, J. M., Dickey, G. E., Borst, M. B., Navin, D., Parham, D. P., Weiner,<br />

R. E. and Milier, T. M. (19651. Combat Maintenance Capability: Executive<br />

Summary (AFHRL-Z-85-351. Wright-Patterson AFB, OH: Logistics and Human<br />

Factors Division, Air Force Human Resources Laboratory.<br />

Meinders, M. (December 19871 . Talking Paper on Wartime Medical (WARMED) Work<br />

Center Description (WCDl. Fort Detrick, MD: Medical Wartime Hospital<br />

Integration Office (MWHIO).<br />

Smiley, A. A. (January 19891. Low Intensity Conflict Scenarios. Fort Huachuca,<br />

AZ: Low Intensity Conflict StudyGroup, U.S. Army Intelligence Center and<br />

School.<br />

Soldier Training Publication 21-I-SMCT (October 1987). Soldier’s Manual of<br />

- -<br />

Common Tasks. Washington, D.C.: Department of the Army.<br />

USAFOMC (April 19791. Air - Base - Ground - Defense Tactics Analysis (AFPT 90-E!‘- 1<br />

137, 90-812-1381. Randolph AFB, TX: Occupational Survey Branch, USAFOMC.<br />

161


COOPERATIVE LEARNING IN THE ARMY<br />

RESEARCH AND APPLICATION<br />

Angelo Mirabella<br />

U.S. Army Research Institute for the<br />

Behavioral and Social Sciences<br />

Cooperative learning (CL) is something many of us did in<br />

college when we took chemistry, physics, or calculus - courses<br />

built around problem solving exercises. We joined with a few other<br />

students to do homework or study for a test. We shared our<br />

understandings of the problems, helped each other correct<br />

misconceptions, and then reached consensus on how to solve' the -'<br />

problems. Today, formally organized cooperative learning groups,<br />

in classroom settings do the same things. Therefore, CL is not new<br />

or revolutionary. Yet, as a formal, institutionalized philosophy<br />

and methodology of instruction, it has been slow to take root,<br />

especially in the military. Public primary schools have progressed<br />

further. In Columbia, Maryland, for example, there is an elementary<br />

school whose classes are taught completely according to CL<br />

principles. The teacher primarily facilitates the work of small<br />

groups. The students and their activities rather than the teachers<br />

are the centers of attention.<br />

It is ironic that CL has taken so long to root since the more<br />

traditional teacher-centered approach, and even more modern<br />

individualized approaches are social arrangements contrary to what<br />

is demanded of people once they leave the school house (Raspberry,<br />

1987, 1988). This contradiction was especially blatant in my<br />

elementary school days when talking among students was punished<br />

with demerits, detention, or dreaded calls to parents to come to<br />

school for a conference. I remember vividly a visit by my father,<br />

who had to lose a day's pay to learn from my teacher that I talked<br />

too much in class. I also recall the back of his hand when I tried<br />

to explain my behavior and give my side of the story. Anyone who<br />

knows me would be astonished to learn that I once suffered from<br />

talkativeness. I often wonder what destructive social conditioning<br />

was imposed by such an environment.<br />

In contrast, cooperative learning implicitly recognizes that<br />

interpersonal relationship, i.e. communication, is one of the most<br />

pervasive and critical sets of skills anyone can learn. And what<br />

better place to foster such skills than the school house.<br />

Cooperative learning, is, at the same time, an effective way to<br />

develop other skills and thereby stretch instructional resources.<br />

At least that is the emerging conclusion of many years of basic<br />

research and some very preliminary applied research by the Army<br />

Research Institute. I'm hedging a bit because CL in the Army is in<br />

its infancy, and the work to be reported, while providing<br />

converging evidence for the effectiveness of CL, was not done in<br />

pristine, antiseptic laboratories.<br />

162<br />

I<br />

-


I'd like to start with an brief overview of basic research on<br />

CL and then review efforts by the Army to test the methodology in<br />

Training and Doctrine Command (TRADOC) schools.<br />

McNeese (1989) provides a useful tabulation of CL research.<br />

He cites Slavin's 1983 review showing that in 46 studies 29 had<br />

shown favorable effects, 15 no differences, and 2 showed advantages<br />

for fftraditionallf education. Johnson & Johnson (1985) reported that<br />

out of 26 studies 21 were favorable, two showed mixed results, and<br />

3 no differences: on balance, strong support for the value of CL.<br />

However, the reviews suggest that merely grouping students is not<br />

enough. Students do have to cooperate. Slavin goes further and says<br />

that group incentives, coupled with individual responsibility,'are -.<br />

essential.<br />

Otherwise, CL works,in many different circumstances. Johnson,<br />

Maruyama, Johnson, Nelson, and Skon (1981), in a review of 122<br />

studies extending from 1924 to 1981, found that CL was effective<br />

across a range of ages, subjects, and tasks. What emerges, from<br />

these reviews, is a type of conclusion often found for new<br />

performance technology. Cooperative Learning can be effective if<br />

properly designed. This conclusion applied to academic settings.<br />

Would it also apply to Army Schools?<br />

Research to answer this question was done at Fts. Lee and Knox<br />

and implemented at Ft. Lee under TRADOC-AR1 partnerships called<br />

Training Technology Field Activities (TTFAs). I first want to say<br />

a word about these, to provide a perspective on why this research<br />

was undertaken. In 1983 the TRADOC Commanding General concluded<br />

that his schools were not capitalizing on a steady stream of new<br />

ideas and technology emerging from the training R & D community.<br />

He wanted to establish a formal link from basic research to the<br />

Army's training community. Accordingly he invited AR1 to join with<br />

selected schools in TTFAs. Their purpose was to test new training<br />

technology, on significant Army problems, using TRADOC testbeds.<br />

The schools and TRADOC HQ were to lead in identifying test bed<br />

problems, while AR1 was to lead in identifying a prototype<br />

research-based solution. The partners were then to join forces in<br />

testing the solution.<br />

Activities were established at several schools including<br />

Quartermaster at Ft. Lee, Virginia and Armor at Ft. Knox, Kentucky.<br />

Cooperative learning projects were undertaken at these schools<br />

because basic research had shown that CL can be very efficient. But<br />

it had to be proven and implemented in Army settings. I'll mention<br />

the Knox work briefly and then focus on the work at Lee, since this<br />

was implemented and is still being used.<br />

Shlechter (1987) at Ft. Knox compared training effectiveness<br />

for cooperative groups of 2 or 4 students, and for individuals in<br />

the 19K MOS (Tank Commanders). From computer-based instruction (on<br />

MICROTICCIT) each student had to learn to interpret radio call<br />

signs and communicate in coded messages, tasks for which<br />

performance deficiencies had been documented.<br />

163


Improvements in performance were the same across training<br />

conditions, but the 4-student groups needed only two-thirds the<br />

time required by individuals to achieve comparable performance.<br />

Individuals and a-student groups were statistically the same here.<br />

Both the 4 and l-student groups made substantially fewer demands<br />

on instructor time as measured by ltcalls for proctor assistance",<br />

e.g 0 and 27 respectively compared to 115 calls from individuals.<br />

At the Ft. Lee TTFA, Hagman and Hayes (1986) examined the<br />

effectiveness of cooperative methods in a more traditional, noncomputer<br />

based setting. They wanted to define specific conditions<br />

under which CL would and would not work. From a review of the<br />

literature, they hypothesized that effectiveness of CL increases' -<br />

with increasing group size, though only when incentives were<br />

provided which encourage group members to share knowledge.<br />

Subjects were drawn from one unit (annex) of instruction in<br />

the 76C MOS Advanced Individual Training (AIT) course for supply<br />

clerks. Within this unit, the students receive a series of lectures<br />

each followed by a practical exercise (PE). Midway and at the end<br />

of the unit, the students are individually tested. Students who<br />

fail, go to study hall for remediation. Those who fail a retest are<br />

llrecycledVt, i.e required to repeat the annex.<br />

For the experiment, students were assigned to one of 3 groupsize<br />

conditions. They did the PEs alone, in groups of 2, or in<br />

groups of 4. The groups of 2 and 4 were further divided into two<br />

incentive conditions. Under one condition (Group Incentive), if any<br />

student in the group failed, every group member went to study hall.<br />

Under a second condition (Individual Incentive) only the failing<br />

student went to study hall. Hagman and Hayes predicted that under<br />

a group incentive (i.e. everyone to study hall), performance would<br />

increase as group size increased, but decrease with increasing<br />

group size under individual incentive.<br />

Results partially supported this prediction. For each of the<br />

two tests in the annex, groups of four were clearly optimal under<br />

the group incentive. But statistically groups of two did not out<br />

perform individuals. Recall that Shlechter had found a similar<br />

result at Ft. Knox. These similar results suggest, as a preliminary<br />

conclusion, that CL groups should contain more than two people.<br />

Other results supported the value of CL, but were inconsistent with<br />

the main hypothesis of the experiment. During the PEs, cooperative<br />

groups made fewer errors than did individuals, with or without the<br />

group incentive. In fact incentive made no difference at all.<br />

A potentially negative effect of CL was that groups took<br />

longer to complete PEs than did individuals. Not surprising since<br />

CL requires time for students to exchange information and ideas.<br />

However, if the added time does not exceed reasonable amounts of<br />

available instructional time or is off set by benefits, it can be<br />

discounted. Both conditions were satisfied in this study.<br />

164


~-~___-. -- ----<br />

Brooks et al (1987) did follow-up research, using the entire<br />

76C course as a test bed, to assess further the benefits of<br />

cooperative learning. This was actually a full-Scale implementation<br />

test with an additional measure: recycle rate. Recycle rate is the<br />

percentage (per course) of students who fail end-of-annex tests,<br />

attend study hall for remedial review of material, fail a second<br />

time, and then repeat the annex. All students in three cooperative<br />

classes worked in groups of four - 34 groups for a total of 136<br />

students. These were compared with students in three other,<br />

regularly scheduled and conducted classes with a combined<br />

enrollment of 128 students.<br />

Results. The bad news was that the Hagman and Hayes finding<br />

of improved test scores for groups of four compared to individuals,<br />

was not seen by Brooks et al. Aggravating bad news was the<br />

agreement with Hagman and Hayes that CL students took longer than<br />

individual students to complete PEs, though here again they<br />

finished within the allotted training time. The investigators<br />

checked to see if a treatment - aptitude interaction might be<br />

buried in the data. They divided subjects into high and low scorers<br />

on the ASVAB Clerical scale, but found no interaction with training<br />

method.'<br />

The good news was that CL students made fewer errors in the<br />

PEs (as in the Hagman study) and that recycle rate was reduced from<br />

10.9% to 4.4%, i.e. 60% lower for CL students than for individuals.<br />

Brooks et al extrapolated this saving to a year's worth of classes<br />

(about 3,000 students) and estimated a cost reduction of $136,000.<br />

Not a large sum in the bigger scheme of things, but if CL were<br />

implemented Army-wide, the savings could be significant. Moreover,<br />

achievement scores in CL classes were not worse than in the<br />

llconventionallt comparison classes. This would be especially<br />

constructive for CL in a computer-based classroom because it<br />

supports assigning one workstation to 3 or 4 students, thereby<br />

reducing the demands for expensive hardware. A potentially positive<br />

effect, demand on instructor time, was not assessed, but may have<br />

been present. Recall in the Shlechter studies, CL students required<br />

notably less instructor help than did individuals. Finally,<br />

students and instructors preferred CL to individual practice.<br />

Outcome of the Research. The work by Shlechter, Hagman and<br />

Hayes, and Brooks et al, as well as a solid foundation of prior<br />

basic research led AR1 to recommend that the Quartermaster School<br />

implement cooperative learning. Brooks (1987) wrote an instructor's<br />

manual on how to set up and manage a CL classroom. The methodology<br />

has since been used in AIT for 76C MOS. Moreover, if and when<br />

computer-based instruction becomes wide-spread in the Army, this<br />

same methodology could save millions of dollars. With multiple<br />

students per workstation, the number of required stations could be<br />

reduced by two-thirds to three quarters. And cooperative learning<br />

could very well revolutionize the way the Army trains.<br />

165


REFERENCES<br />

Brooks, J.E. (1987). An Instructor's Guide for Implementinq<br />

.Coonerative Learninq in The Equipment Records and Parts<br />

Snecialist Course ((AR1 Research Product 87-35).<br />

Alexandria, VA: US Army Research Institute for the<br />

. Behavioral and Social Sciences.<br />

Brooks, J.E., Cormier, S.M., Dressel, J.D., Glaser, M., Knerr,<br />

B.W., & Thoreson, R. (1987). Cooperative Learnina: A New<br />

Approach for Traininq Equipment Records and Parts<br />

Specialists (AR1 Technical Report 760). Alexandria, VA:<br />

US Army Research Institute for the Behavioral and Social<br />

Sciences.<br />

Hagman, J. D, & Hayes, J.F. (1986). Cooperative Learnina: Effects<br />

of Task, Reward, and Group Size on Individual Achievement<br />

(ARI Technical Report 704). Alexandria, VA: US Army<br />

Research Institute for the Behavioral and Social Sciences.<br />

Johnson, D.W., & Johnson, R.T. (1985). The Internal Dynamics of<br />

Cooperative Learning Groups. In R.E. Slavin, S Sharon, S.<br />

Kagan, R.H. Lazarowitz, C. Webb, & R. Schmuck (Eds), ~_ Learninq<br />

to Cooperate, Cooperatinq to Learn. New York: Plenum.<br />

Johnson, D.W., Maruyama, G., Johnson, R.T., Nelson, D., & Skon, L.,<br />

(1981). The Effects of Cooperative, Competitive, and Goal<br />

Structures on Achievement: a Meta-analysis. Psvcholoqical<br />

Bulletin, 89. 47-62.<br />

McNeese, M.D.(1989). Explorations in Cooperative Systems: Thinkinq<br />

Collectively to Learn, Learninq Individuallv to Think (AAMRL-<br />

TR-90-004). Wright-Patterson Air Force Base, Ohio: Armstrong<br />

Aerospace Medical Research Laboratory.<br />

Raspberry, W. (1987, September 29). Why Should Kids Compete in<br />

Class. Washinqton Post.<br />

Raspberry, W. (1988, August 1). From School to the Real World.<br />

Washinaton Post.<br />

Shlechter, T.M. (1987) . Grouped Versus Individualized Computer-<br />

Based Instruction (CBI) Traininq for Militarv Communications<br />

(AR1 Technical Report 1438). Alexandria, VA: US Army Research<br />

Institute for the Behavioral and Social Sciences.<br />

Slavin, R.E. (1983) . When Does Cooperative Learning Increase<br />

Achievement? Psvcholoqical Bulletin 94, 429-445.<br />

166<br />

---7<br />

.<br />

I


BATTLE-TASIUBATTLEBOARD TRAINING<br />

APPLICATION PARADIGM AND RESEARCH DESIGN<br />

John C Eggenberger PhD, Director Personnel Applied Research and Training Division,<br />

SNC Defence Products Limited<br />

Ronald L. Crawford PhD, Professor, Concordia University<br />

1. Introduction<br />

Competitive advantage occurs when one protagonist creates and exploits superior relative certainty in<br />

an area which is uncertain or problematic within the industry. Porter, Khandwalla, Waterman & Peters,<br />

and others have proposed typologies of ways in which one can gain competitive advantage (vie: product,<br />

promotion, investment, scope, etc.), but they give a false impression that these represent institutional,<br />

executive level, quantum events. The opposite, in fact, is the typical case. Quinn, Mintzberg;Crawford,<br />

Gram, and Star, Cyert, March, Cohen, Drucker and others emphasize that competitive advantage is more<br />

typically achieved cumulatively through successions of w ram to locally evident ambiguities,<br />

threats, opportunities and variations. In other words, through voluntary improvisations which<br />

people undertake on their own initiative in relation to disciplined actions.<br />

The improvisation process has been studied extensively by members of the SNC Personnel Applied<br />

Research (PAR)team in both military and civilian settings. It corresponds very clearly to the behavioural<br />

theory of the firm, consisting broadly of:<br />

� applying heuristic diagnostic and response skills;<br />

� using experimentation to test beliefs, learn more, and influence the constellation of factors; and,<br />

� creating uncertainty among one’s competitors.<br />

In typical populations, psychological readiness and capacity to exercise “disciplined initiative” thence<br />

improvise, are statistically uncommon. These populations are characteristically stabilized well before<br />

entry to the work force, and are developed over long periods of intensive investment of time, energy, and<br />

resources. There is substantial evidence, however, that comparable skills can be achieved by adults,<br />

although most current examples tend to be costly and harrowing experiences for the participants. Real<br />

or simulated equivalents of combat, for example, do produce high levels of intuitive problem solving and<br />

experimental learning, but with significant casualty rates and considerable cost.<br />

In this regard, the PAR team has identified the following:<br />

� a method and content which can be employed in broad based training & development<br />

settings to produce effective improvisations from the application of “Disciplined Initiative”<br />

� a reformulation of that content into a format which retains a high level of psychological<br />

engagement but reduces the resource requirements and real/psychological casualty rates;<br />

� a refinment of the content and method of instruction into a form suitable for field trial in a<br />

military setting; and<br />

� a development from the field trial parallel curricula tailored to the context of other industry<br />

applications and levels of management.<br />

2. Disciuline and Initiative:<br />

What are the major determinants or sources of initiative ? Discipline, on one hand, is acquired by<br />

learning how to deliver predictable and standardized outcomes when appropriately cued (certainty j.<br />

Improvisation, on the other hand, is the delivery of a satisfactory outcome when initiative is exercised,<br />

i.e., action is called for but the cues have not been experienced before, nor is there an available<br />

repertoire of rehearsed responses to cope with the situation (uncertainty). A far more complete<br />

Copyright SSC PAR DIV .&far 1990<br />

167


----- - .-_.-------_.. -_.-I-. .--.<br />

treatment of these notions in relation to prior research has been done and reported elsewhere.<br />

For the military commander, regardless of appointment, as well as for other vocations, the distinction<br />

between “discipline” and “initiative” is important. The commanders of sections, platoons, companies,<br />

battalions, divisions, corps, armies, who can handle both the determinate and indeterminate<br />

aspects of their responsibilities would appear to project a number of advantages as follows:<br />

� the commander would cope more effectively with both foreseen and unforeseen events,<br />

� the commander would require substantially less attention or supervision, and<br />

� the commander would have a greater capacity to assume authority.<br />

Within the context of the military commander some of the research questions we propose to ask in<br />

relation to Discipline, Initiative and the capacity to Improvise are as follows:<br />

� How is discipline developed? � How is initiative developed? . How do discipline and<br />

initiative interact? � How does “disciplined initiative” influence improvisation outcomes?<br />

3. The maior prouositions are listed as follows:<br />

� the more a person experiences intimate, emotional, idealistic and reinforcing socialization<br />

experience, the more a person will have the propensity to exercise “disciplined initiative” under<br />

conditions of uncertainty;<br />

� the more a person exercises “disciplined initiative” under conditions of uncertainty, the<br />

more a person will be able to exploit available options (improvise) in a battle situation;<br />

. the more opportunities the person has to rehearse battle scenarios under controlled conditions,<br />

the more the person ‘will exercise appropriate “disciplined initiative” decisions and actions;<br />

and,<br />

� the more a person acquires and copes with difficult assignments, the more a person will<br />

continue to exercise “disciplined initiative” under conditions of uncertainty.<br />

The matrix at Figure 1 shows the importance of DISCIPLINED INITIATIVE to the <strong>Military</strong> Commander.<br />

Clearly, it is important to design and deliver a curriculum of continuing training and education<br />

that will result in the bulk of the Commanders belonging to the upper left quadrant, and none<br />

found in the lower right quadrant.<br />

Copyright SSC PAR DIV Mar 1990<br />

HIGH<br />

HIGH 4-e LOW<br />

IMPLICATIONS OF DISCIPLINE AND INITIATIVE FROM THE<br />

PERSPECTIVE OF THE MILITARY COMMANDER<br />

168


4. Situational Awareness and the Militarv Commander<br />

Situational awareness has also been developed to deal with recent reanalyses of the sorts of thinking<br />

that goes on under complex and rapidly changing conditions, especially when information inputs<br />

and ouputs are degraded by blockages and noises of a variety of kinds and intensities. Essentially,<br />

the Commander must be able to act upon knowledge of himself and his forces, the disposition of the<br />

enemy forces, anticipate reaction of the enemy to his intiatives in the context rapidly changing conditions<br />

and timelines.<br />

The basis of the Commanders action is inputted information COMMUNICATED to him, mainly<br />

audio (voice) and visual (eye); outputted action information COMMUNICATED by him, mainly<br />

audio (voice) and psychomotor (eye-mind-finger-hand). What is of concern in the production of .<br />

qualified COMMANDERS is the types and ranges of thinking that must occur in order to decide on,<br />

and communicate, courses of action that are appropriate for given scenarios.<br />

5. Training to Tactical and Strategic Actions<br />

The effectiveness of combat elements depends to a great extent upon the ability of their personnel to<br />

carry out three kinds of actions:<br />

� Highly, efficient enactment of predictable routines, such as mobilizations, preparation for<br />

action, decamping, assembly, deployment into and out of movement formation, and establishing<br />

formations for classical types of actions. These are activities which recur with regularity in such<br />

consistent form that a well-drilled unit literally has them down to a well honed science. These are<br />

performed with minimal judgement because the “solution” is already known.<br />

� Applying “Directing Staff Solutions”, or “classical tactics” effectively in appropriate field<br />

and simulated situations. The clearest illustrations of these are the action sequences or drills in small<br />

unit tactical manuals. Those tell the participant what to do and how to do it under most tactical<br />

conditions. Directing Staff solutions require an element of active diagnosis of the context (i.e. a<br />

military appreciation), a choice among alternative responses from a standardized repertoire, and adaptation<br />

of those responses to match situational particulars.<br />

� Improvisation. Patton observed that plans never survive the initial engagement. Substantially<br />

the same sentiments of commanders and theoreticians across millenia demonstrate that under<br />

firing line conditions, the classical solution sometimes cannot be ascertained, may not apply because<br />

of locally evident threats or opportunites (or may even be counterproductive if it represents definitive<br />

intelligence for the opposing force).<br />

Under all battle conditions improvisations are required. Typical kinds of improvisation include:<br />

� making tentative and partial diagnoses under uncertainty;<br />

� using action to test diagnoses, clarify the context, and alter the context; and<br />

� creating uncertainty for the opposing.force.<br />

6. RelationshiD of, fPpG to ombat Related Doctrinal Training<br />

0<br />

Training to doctrine usually takes up the following three forms:<br />

Copyright SSC PAR DIV -Mar 1990<br />

169<br />

.


a) SCRIPTED ROUTINES, comprised of,<br />

. Rationale; Components; Chained components; Whole (Insight - Gestalt).<br />

b) ADAPTED ROUTINES, comprised of,<br />

� Pattern recoenition - more of the same situations.<br />

� Repertoire- more routines and variations.<br />

� ���<br />

c) IMPROVISATIONS, comprised of,<br />

� An Act -Watch stream.<br />

7. Scrinted Routines<br />

� A Convergent stream - � process of elimination � working backwards<br />

� partial solution � simplified modes � analogues<br />

� An Enacting stream � via networks � forcing errors � tactics of mistakes<br />

� A Comnetitive stream � using edge of certainty � creating uncertainty for others<br />

Scripted routines are the action equivalent of commodity strategies, or mass production. They<br />

depend for their effectiveness upon speed, precision, predictability and integration of more or less<br />

complex but fixed routines. Realtime thinking is largely replaced by decision loops and redundancy.<br />

The optimal training scenario for such manoeuvre is the rehearsal. In rehearsals, the “big picture”<br />

(e.g., a drill, movement, parade...) is broken down into its constituent components, such as tasks and<br />

actions. These are rehearsed until the trainee achieves complete command. The components are<br />

strung together in progressively longer trains of action until the entire routine is represented. Psychologically,<br />

the process is a direct application of behavioural conditioning (chaining).<br />

The challenge of teaching scripted routines is that the actions involved tend not to be very exciting<br />

or involving. This requires developing imaginative training methods such as competing against the<br />

clock, against scoring systems, or against other teams. Varying the training content will also help:<br />

here it is important to move from board simulations to field simulations at an early stage (e.g.,<br />

convoying in snowstorms, at night without lights...).<br />

8. Adanted Routines<br />

The fundamental training objectives here are to create repertoires of stored situational patterns, and<br />

to match these with “DS Solutions”, or repertoires of behavioural routines appropriate for each situation.<br />

Unlike scripted events, where a workable outcome is effectively guaranteed by rote enactment<br />

of a fixed recipe (e.g., a parade or unobstructed road movement), enactment of adapted routines<br />

requires making more or less continuous judgements and reassessments. Those are necessary, first,<br />

because tactical situations all differ in important detail, and because they change as actions evolve.<br />

These judgements are also necessary to modulate actions and to maintain unit control and external<br />

coordination. Under operational conditions there is precious little time or attention for anything else.<br />

That means that the “basics” of situational appreciaticn (information gathering, interpretation, summarizing<br />

in a working model) then identifying and implementing the appropriate tactical response<br />

must be almost reflexive.<br />

The classical approaches to situational assessment and theoretical doctrine actually work reasonably<br />

well. That is, breaking the processes into stages, and then working through numerous examples of<br />

each stage, starting with simple, small scale examples and working up to more difficult and complex<br />

Cqwight SSC PAR DIV Mar I990<br />

*-,. ,.,r .,..,_<br />

170


examples then chaining the stages successively into processes. The main problems with contemporary<br />

training are that it is not engaging or realistic enough, that trainers are usually insufficiently<br />

prepared or supported with aids, scenarios and materials, and that the trainers do not experience<br />

enough examples/cycles, and variations, for recognition and response to become second nature. The<br />

ultimate objective is to prepare someone who can act like he has seen the situation before, understands<br />

it, can visualize the opponent’s perspective, and can select and enact the actions needed to<br />

ensure favourable outcomes. The response of a BATTLE-TASK TRAINING/BATTL,EBOARD<br />

based curriculum is to create surrogate experience, and to meter that experience at a controlled but<br />

challenging rate of exposure.<br />

9. Imnrovisations<br />

The objective behaviour here is taking effective local action under uncertainty or ambiguity that<br />

obviates rational, calculated decisions. These conditions are commonplace, yet they are poorly<br />

. addressed (and in some quarters actively denied) in the training curricula. Improvisation accounts<br />

for the extent people are able to respond to, understand, exploit, and occasionally create transient,<br />

locally evident threats, opportunities, and ambiguities (versus becoming immobilized or simply<br />

plowing ahead according to the initial set of orders).<br />

Current knowledge of how people respond under uncertainty concentrates upon the interacting<br />

processes of:<br />

� heuristic problem solving,<br />

� learning by experimentation, and<br />

� creating opponent uncertainty & loss of initiative.<br />

To teach to these interacting processes the simulation scenarios and curriculum is modified. The<br />

scenarios follow a simpler progression than pattern recognition and/or repertoire development.<br />

These scenarios would directly link, simple - complex, small scale - large scale. However, they will<br />

be made deliberately ambiguous, with cues of increasing subtlety regarding threats, opportunities,<br />

dispositions and intentions. The objectives will be attaining tactical certainty and initiative (i.e.,<br />

bringing the simulation back to an accepted routine format). The role of the umpire/trainer will be<br />

made much more active, as he will effectively be reorienting the scenario to reflect what is learned<br />

from each move as well as the objective outcomes. The discussion emphasis will shift from recognition<br />

to inference. Physically the simulations will employ progressive disclosure. As capacity develops,<br />

options such as fractures in command - coordination can be included. The core issues are:<br />

� how can I think and experimentally work my way into a situation where I know what’s<br />

going on and can employ my tactical handbook, and<br />

� how can I prevent my opponent from getting to that stage first?<br />

10. TheTrials and the Setting.<br />

The initial focus of the trials is the Army Reserve Unit. Decentralization, resource constraints, limited<br />

personnel time, and the Army Reserve Unit’s need for an experience payoff which will enhance both unit<br />

and civilian career opportunities of participants, and the critical role of the Army Reserve Unit in a<br />

scenario of future national defence make the Army Reserve Unit a particularly attractive site for the trials.<br />

Further, favouring the choice of a Army Reserve focus is the availability of the training simulation<br />

Copyright SSC PAR DIV Mar 1990<br />

171


device BATTLEBOARD, a robust and transportable table-top terrain modeling simulator, and readily<br />

adapted doctrinally based Battle-Task training scenarios, as well as pre-existing knowledge of the<br />

general nature of tactical uncertainties of combat arms units, and the compressed time frames and clear<br />

field testing which Reserve settings offer.<br />

11. Interrupted Training Schedules<br />

Moreover, Army Reserve Unit training curriculum has not yet been specifically addressed in terms of<br />

the real constraints confronting a Army Reserve Unit. The Army Reserve Unit soldier trains “part time”,<br />

while the “Regular Force” soldier can train “fulltime”. Courses and exercises are not interrupted for the<br />

Regulars while they always are for the Reserve Unit. The usual method of fitting training requirements<br />

to Reserve Unit needs is to cut parts out of a curriculum, sequence the course curriculum differently, and/<br />

or stretch it out over a longer time period. Clearly these approaches will not be adequate for enlarged<br />

Reserve Units in a Total Force Army.<br />

Thus, throughout this project there is a concurrent activity devoted to applying the ingredients of a<br />

theory/model of linked learning. This theory/model is needed in order to accomodate the time<br />

available for training the Reserve Unit person. Usually this time is available in “dribs and drabs”.<br />

As a consequence, the curriculum must be parcelled out for the Reserve Unit in such a fashion that<br />

the results of training are the same as for the Regulars, who engage the curriculum as a coherent<br />

whole.<br />

The core assertion of this linked learning notion is that each BATTLE-TASK (e.g. “Advance to<br />

contact” - Infantry alone), is taught “wholistically”, in the teams that are in command, using terrain<br />

models, with the instructor using the “inductive” mode of instruction. The objective is to increase<br />

situational awareness in the team, and enable them to distinguish between “discipline” and “initiative”<br />

to increase the teams comprehension of, and the use of, “Disciplined Initiative”<br />

12. Action<br />

Figures 2,3,&4, overleaf portray the format of the trials that conform action science research strategy<br />

encouraged by Argyris et el(1985), and the Personnel Applied Research method used by the PAR team.<br />

Copyright SSC PAR DIV Mar 1990<br />

172


I<br />

L<br />

THE PERSONNEL APPLIED RESEARCH FORMAT<br />

COMBAT FORMATION = INFANTRY<br />

TEE MAh’OELVER li ADVAKCE ‘l-0 COSTACT<br />

TEE “DELIVERABLE” = A KILLTAKEN FROM AS ESEMY FORCE<br />

TACTICAL DOCI‘RIXE = (CEOOSE ONE)<br />

+<br />

choose one<br />

COMBAT FGRMATlONS TRAMPD<br />

USING THE BATTLEBOARD<br />

TRAl?SlSG SYSTEMS<br />

THE<br />

ACTION<br />

Copyright SSC PAR DIV Mar 1990<br />

THE<br />

RESLZT<br />

THE THE THE<br />

THE TRAm7b.G OFTlOS ACTION RESULT COMPARISON<br />

Chax.. OSE<br />

. SCRIFI-ED ROLmES<br />

. ADAkTED ROL?IThXS<br />

. MPROWBATIOSS<br />

� SCRIPTED ROLXXES<br />

� ADAPTBD ROUI-LSES<br />

. IMPROVISATIOSS<br />

. SCRWlED ROL-33<br />

. ADAPTED ROLTCIIES<br />

. IMPRO”ISA77OSS<br />

I I<br />

THE PERBOhrlYEL APF‘IED RESEARCH - BASIC DRSIGN<br />

THE COMARITWE ANALIS=<br />

177<br />

THE<br />

COMPARISON<br />

COw*AT FoRx4HATIONS TRANED<br />

USING CURREWT<br />

TRAIMNG SYSTEMS<br />

THE<br />

ACTIOX<br />

THE<br />

RESULT


_-- ..--. _-. ._ _--... ----<br />

__ _ . _ _<br />

COMBAT VEHICLE COMMANDER'S SITUATIONAL AWARENESS:<br />

ASSESSMENT TECHNIQUES<br />

Carl W. Lickteig<br />

Major Milton E. Koger<br />

U.S. Army Research Institute<br />

Field Unit-Fort Knox<br />

Captain Thomas F. Heslin<br />

2nd Squadron, 12th Calvary Regiment<br />

Fort Knox, Kentucky<br />

Abstract<br />

The ability to "see the battlefield" is critical to<br />

successful execution of the battle. This precept is true at all<br />

echelons including commanders of small units and individual<br />

weapon systems. To train and foster this ability, however,<br />

methods for assessing and enhancing the commander's situational<br />

awareness (SA) are required. Recent efforts (Endsley, 1988;<br />

Fracker, 1988) have focused on the development of objective<br />

measures of fighter pilots SA. This paper extends this effort to<br />

measures of SA for land combat vehicle commanders.<br />

As part of the Army Research Institute's (ARI) program of<br />

research in support of future Combat Vehicle Command and Control<br />

(CVCC) systems, small unit commander's SA was identified as a<br />

potentially important measure of system effectiveness. Parallel<br />

forms of two SA instruments were developed for objective<br />

assessment of a commander's perception, comprehension, and<br />

projection of the battlefield situation. This paper provides a<br />

description of these SA instruments and their utilization in<br />

support of the CVCC simulation-based program.<br />

Background<br />

The combatant's SA represents his knowledge of the world and<br />

his role in it. SA includes both lower and higher order mental<br />

processes ranging from the simple perception of individual<br />

elements of the situation to an assessment of their meaning and<br />

impact on immediate and overall mission objectives. Endsley's<br />

model of SA details three distinct levels--perception;<br />

comprehension, and projection--included in the following<br />

definition of SA: ". ..the perception of the elements in the<br />

environment within a volume of space and time, the comprehension<br />

of their meaning, and the projection of their status in the near<br />

future" (Endsley, 1988, p. 97).<br />

For ground forces, SA is more commonly described as the<br />

commander's ability to "see" the battlefield in relation to his<br />

mission and the overall mission. Combined arms combat,<br />

particularly for ground systems, entails coordination and support<br />

of multiple units. Situational awareness for combined arms<br />

commanders must include, perhaps more so than for combat pilots,<br />

the context of the combined mission.<br />

174


Typically a commander's awareness of a combat situation<br />

begins with the assignment of his unit's mission embedded in the<br />

concept or schema of the overall mission that his unit is<br />

supporting. The mission specifies the area of operations on the<br />

battlefield, the place(s) in the world that the commander is to<br />

occuPYl as well as the objectives and time frame driving mission<br />

pace. The mission brief and order of operations describe the<br />

known and suspected enemy forces and activities in that area, key<br />

terrain features and locations related to mission accomplishment,<br />

and friendly combat, support, and service support units<br />

responsible for mission execution.<br />

Once the battle commences, the commander's perception<br />

(Endsleyls SA Level 1) of the situation is enhanced by the direct<br />

or reported detection of enemy units. When initial contact and<br />

spot reports are received by the commander, his perception of the<br />

situation must be quickly updated. As a commander, he must also<br />

attempt to comprehend (SA Level 2) this information, its<br />

significance to his unit and mission. Given the reported type<br />

and number of enemy units detected, he may begin to estimate the<br />

size and type of the overall force committed, their weapon<br />

systems and range, their organization and support.<br />

As his understanding of the situation develops, the commander<br />

begins to project (SA Level 3) or reassess probable courses of<br />

action. Given the location and heading of units reported and his<br />

estimate of force structure, he may begin to calculate when, or<br />

if, the main unit will reach his location, at what point he may<br />

need to displace his unit from their current location, and what<br />

impact the current situation will have on the future situation<br />

such as his unit's next proposed location.<br />

This effort in SA development for ground systems is part of<br />

ARI's program of research in support of future CVCC systems.<br />

These systems will provide ground vehicle commanders a unique<br />

capability for the digital communication of text and graphic<br />

battlefield information, in addition to conventional FM radio.<br />

The CVCC program objective is the development of soldier-tested<br />

specifications for future automated command and control systems<br />

for ground combat vehicles. AR1 conducts simulation-based tests<br />

o.f prototype CVCC systems using the Armor Center's Close Combat<br />

Test Bed (CCTB), formerly Simulation Networking Developmental<br />

(SIMNET-D), at Fort Knox.<br />

Simulation-Based Methodology<br />

An objective measure of commander's SA is based on a<br />

comparison of the actual situation with the commander's<br />

assessment or report of the situation. Maintaining an accurate<br />

knowledge of the battlefield situation, however, is difficult for<br />

both commanders and SA researchers. For the latter, simulationbased<br />

scenarios provide a capability to control and know the<br />

battlefield situation.<br />

175<br />

. .


To ensure an accurate knowledge of the actual situation at<br />

the time of SA assessment, a set of battlefield situations,<br />

vignettes, were developed in which all the informational elements<br />

pertaining to the situation were prespecified and prerecorded.<br />

Prespecification ensured standardization of situation<br />

determinants.<br />

Prerecorded materials for the vignettes included simulationbased<br />

files designating commander and friendly unit locations,<br />

operational overlays to be displayed on the commander's Command<br />

and Control Display (CCD), and message sets to be received on his<br />

CCD during the vignette which would provide updates on his<br />

battlefield situation.<br />

At the start of each vignette, the<br />

commander was provided a map and map board with acetate<br />

operational and note overlays, and a brief description of the<br />

battlefield situation leading up to the vignette.<br />

The tactical situation for the vignette placed the commander<br />

in his tank simulator occupying a stationary defensive battle<br />

position (BP) for a delay-in-sector mission. The time frame for<br />

the vignette began after the postulated successful delay of<br />

initial enemy elements by his unit. The vignette was terminated<br />

prior to his unit's displacement to a subsequent BP.<br />

Immediately after a lo-minute message reception and<br />

processing phase, the commander was escorted out of his simulator<br />

to an adjacent workstation. He retained the map and map board<br />

used while receiving messages, but the operational and note<br />

overlays were replaced with another acetate sheet depicting only<br />

his own BP and the BPS of the adjacent companies. The commander<br />

was given one version of both the plotting and 'Iseeing"<br />

questionnaires, described in the following section, and lominutes<br />

to record his answers. For each vignette, one<br />

questionnaire pertained to the current situation and the other to<br />

the future situation in a counterbalanced sequence across the<br />

series of vignettes.<br />

SA Instruments<br />

The primary goals in the development of the situational<br />

measures for this effort were (a) to develop a set of items that<br />

addressed each of the primary levels of SA for small unit ground<br />

commanders, and (b) to develop a response format that supported<br />

objective scoring of a commander's SA responses.<br />

Perception: Plotting<br />

To capture commander's perception of the situation, a<br />

situational awareness form was developed which required<br />

commanders to plot on a military map the locations of reported<br />

enemy units, friendly units, and key control measures. The<br />

location data selected for these items were based on SME's<br />

176<br />

.


Table 1<br />

Situational Awareness Items: plotting the Battlefield Situation<br />

Current Situation Future Situation<br />

Largest unit engaged<br />

Largest unit approaching<br />

Friendly scout unit<br />

Target reference points<br />

Largest unit outside sector<br />

Support unit to rear<br />

Company's subsequent BP<br />

Obstacle(s) to rear<br />

Enemy scouts to rear . . .<br />

Mortar unit to rear<br />

estimates of the more important location information provided<br />

during the vignette. A five-item series of plotting questions<br />

was developed for both the commander's current situation and<br />

future situation as indicated in Table 1.<br />

The current situation was defined by informational elements<br />

of more immediate concern to the commander including enemy<br />

elements currently being engaged by his unit. The future<br />

situation was defined by less immediate information including<br />

enemy units in the area but well beyond current range, or<br />

information related to his next location, the subsequent BP.<br />

Comorehension and Projection: 18Seeino"<br />

To assess the commander's comprehension and projection of the<br />

battlefield situation, a second SA form was developed. Items on<br />

this form required commanders to compile isolated report<br />

information into aggregate reports, to estimate the size of<br />

designated enemy units including main and attacking units, and to<br />

project the impact of the information received on his unit's<br />

current and future situations. Five close-ended items were<br />

developed for both the current and the future situation, Table 2.<br />

For the current situation the items addressed the commander's<br />

ability to comprehend the more immediate battlefield situation to<br />

the front of his current BP. The first two items required him to<br />

compile reported information received during the vignette into<br />

summary reports detailing the number and type of enemy units<br />

destroyed and damaged by his company, and the number and type of<br />

enemy units still approaching his current BP. The remaining<br />

items addressed the commanderls ability to go beyond the data<br />

actually reported, to understand the nature of the threat facing<br />

both his company unit and the overall task force. These items<br />

asked the commander to estimate in turn the size and type of the<br />

enemy unit actually engaged, the unit approaching his company,<br />

and the total unit committed against the overall task force.<br />

177


Table 2<br />

_-_-.-__. .-_.. -. .-.-<br />

Situational Awareness Items: "Seeing*' the Battlefield Situation<br />

Current Situation Future Situation<br />

Number 61 type enemy damaged Distance/direction to main Unit<br />

Size t type unit engaged Heading of main enemy unit<br />

Number & type unit approaching ETA main unit < 2,000 meters<br />

Size & type force approaching Distance/direction next BP .<br />

Overall size & type unit Impact of obstacle(s) on unit's<br />

confronting the task force next BP<br />

~For the future situation, the items addressed the commander's<br />

ability to project beyond his immediate situation and use the<br />

information provided during the vignette to anticipate upcoming<br />

events. The initial items focused on the commander's awareness<br />

of the main enemy unit approaching his company sector. Reports<br />

received during the vignette had provided information about the<br />

heading and location of a relatively large enemy unit in the<br />

company's sector but well beyond engagement range. The commander<br />

was required to provide the location and heading of this main<br />

unit, and then estimate if, and when, that unit would approach<br />

within 2,000 meters of his current location.<br />

The final two items assessed the commander's awareness of key<br />

information related to his unit's proposed future location. One<br />

item asked him to provide estimates of distance and direction to<br />

his unit's subsequent BP, and the final item asked him to assess<br />

the impact that reported obstacle(s) might have on movement to,<br />

and occupation of, that BP.<br />

Obiective Responses: Scoring<br />

A key concern in the construction of the SA items was to<br />

develop a set of questions that clearly specified the situational<br />

information requested. The simulation-based vignettes driving<br />

the scenarios were designed by subject matter experts (SMEs) to<br />

' provide a wide range of battlefield reports of differing<br />

relevance to the commander's mission. To ensure commanders<br />

clearly understood what information was being requested for each<br />

item, special attention was given to item wording. The item<br />

stems consistently provided and emphasized, for example,<br />

distinctions between enemy units engaged versus not engaged,<br />

locations in the unit's sector versus adjacent sectors, and<br />

elements to the front versus the rear of the unit's BP location.<br />

To meet the goal for SA instruments that could be objectively<br />

scored, the response formats required commanders to provide<br />

178


answers that precisely indicated their knowledge of the<br />

information requested. For items in which commander's were<br />

required to plot the locations of designated elements, objective<br />

assessment of location accuracy was straightforward. For the<br />

remaining items directed at comprehension and projection of the<br />

situation, a combination of fill-in-the-blank (e.g., enemy type,<br />

number) and multiple choice (e.g., mechanized rifle battalion<br />

versus tank company) item formats were used. SMEs assisted in<br />

the construction of all response options to provide commanders<br />

appropriate and meaningful response alternatives.<br />

Two pilot sessions with active duty Armor commanders, three<br />

platoon leaders and one company commander per pilot, were<br />

conducted to obtain user feedback on the SA procedures and items<br />

developed. During the first pilot, commanders provided detailed<br />

feedback during structured debriefs. Their comments assisted,<br />

particularly, in the identification of items requiring more clear<br />

or explicit wording. Their recommendations were included in<br />

revisions to the SA measures, and the revised questionnaires used<br />

for the second pilot appeared quite adequate.<br />

SA Utilization<br />

The SA forms are currently being used in ARI's CVCC program<br />

of research to investigate small unit commander's information<br />

requirements. An initial evaluation compared commander's SA as a<br />

function of message sets received on their CCD that differed in<br />

volume, number of messages per set, and relevance to their<br />

battlefield situation. Results of this effort are expected to<br />

provide recommendations for improving the design of this future<br />

automated command and control system. In addition, this data<br />

will be used for empirical validation of the SA method and<br />

instruments described.<br />

A follow-on baseline evaluation of commanders using only<br />

conventional FM radio systems without a CCD will provide<br />

comparison data on the speed and accuracy of the CCD for<br />

receiving and relaying battlefield communications. As an<br />

additional dependent measure, the SA instruments will provide<br />

comparison data on the CCD's ability to help the commander<br />

integrate command and control information into a more accurate<br />

awareness of his battlefield situation.<br />

References<br />

Endsley, Mica R. (1988). Situation awareness in aircraft systems.<br />

In Proceedinas of the Human Factors Society-32nd Annual<br />

Meetinq, I, 96-101. Santa Monica, CA: The Human Factors<br />

Society.<br />

Fracker, Martin L. (1988). A theory of situation assessment:<br />

Implications for measuring situation awareness. In Proceedings<br />

of the Human Factors Society-32nd Annual Meetinq, 1, 102-115.<br />

Santa Monica, CA: The Human Factors Society.<br />

179<br />

. .


An Aviation Psychological System for Helicopter<br />

Pilot Selection and Training<br />

F.Fehler<br />

Consulting Psychologist, German Army Aviation School, Bi.ickebu1.g<br />

1. Current Situation in Aviation Psychology<br />

111 Germany, aviation psychology looks back OIL an impressirig<br />

history which had its beginnings way back in 1916 as some<br />

mythical accounts would have it. Although it is untrue that the<br />

late "Red Baron" made the acquaintance of aviation<br />

psychologists, it is certainly true to say that all German<br />

military pilots since the end of WW I have been confronted with<br />

aviation psychology in one way or another, if not with an<br />

actual aviation psychologist, then at least with aviation<br />

psychology methods and instruments. AS a general rule, such<br />

instruments would include paper and pencil tests, and boxes<br />

with all kinds of levers, buttons, lights and bells. III the<br />

sphere of aviation, psychology was essentially synonymous wit.11<br />

pilot candidate selection. Presumably this is also true fol<br />

other countries where aviation psychology is practiced.<br />

On the aLher Iland, aviation psychologists were surprisingly<br />

hesitant in touching two other important areas of aviation,<br />

namely<br />

- pilot training<br />

- psychological support for aviators.<br />

Obviously, this is a short-sighted Lttitude, for it is tile<br />

training that will show whether or not the previous<br />

psychological screening methods were succesful. Psychologists<br />

should therefore attend flight training, either by making<br />

active contributions or by merely acting as obeservers, to<br />

ensure that the criteria applied to conducting the trainir1.g and<br />

to assessing the achievements made are the same as those that<br />

were applied to devising and evaluating their own test me,thods.<br />

Any other approach would not lead .to representative validation<br />

coefficients.<br />

An aviation psycholgist who descends from the heights of his<br />

ivory Lower research to offer his knowledge to an aviation<br />

school and commit himself to solving its practical, everyday<br />

problems will soon find himself left alone and discover thtit he<br />

does not have the psychological tools required. Wtlat i.s Lhc:<br />

reason for- this and what call be Doyle about it'?<br />

180


2. Identifying The Problem<br />

2.1. Screening Methods Used Ry Aviation .Psychologists<br />

The need to screen pilot candidates is undisputed; screening does<br />

not only serve the purpose of making the training cbst-effective,<br />

it also intends to save unsuitable applicants from having to abort<br />

a career. The problem of choosing suitable applicants seems to be _<br />

an easy task for the practitioner in psychology, as he can choose<br />

freely from a plenitude of psychological methods that have been<br />

accumulated by two generations of psychologists having done<br />

exstensive research in this particular area. On looking more<br />

closely at the existing litera,ture, however, he will discover the<br />

following: the best achievements made so far are validation<br />

coefficients that lie with r=. 5 in the most favorable cases!<br />

For the selection of applicants this means that he has to apply<br />

the most uncompromising cut-off-scores if he wants to satisfy the<br />

management with less than 10 % of candidates who have to be washed<br />

out from pilot training. It is obvious that such an approach would<br />

be synonymous with a sharp increase in the percentage of<br />

mistakenly rejected candidates, which is totally unacceptable<br />

unless one can draw on a large number of applicants. The latter is<br />

not the case in German army aviation. This means that the<br />

conventional testing methods are exhausted. Now that the old test<br />

methods have been modified and renamed over decades it is hard to<br />

imagine the occurance of a major breakthrough to yield validation<br />

coefficients that are clearly above r=.5 .<br />

2.2. Pilot Training<br />

The contribution aviation psychology has made to pilot training is<br />

in fact very small when one looks at the contributions made in the<br />

field of pilot candidate selection. There is not much point in<br />

taking enormous pains to choose student pilots and then abandon<br />

them to their fates. In the German Army Aviation Branch, the<br />

psychologist will look after every student pilot who gets into<br />

trouble in the course of flight training. This approach has hel.ped<br />

to gain the following experiences:<br />

Problems resulted from<br />

- flight instructors being unable to establish a personal<br />

relationship with the student<br />

- vague learning objectives which were not clearly understood by<br />

students and interpreted differently among instructors<br />

- inconsistent teaching methods<br />

- the structure of the training program being based on a too<br />

demanding learning progression and paying no attentiorl to<br />

student's individual training needs and learning speeds.<br />

181


In his attempt to overcome these difficulties, the aviation<br />

psychologist learns that he lacks an important tool: to develop<br />

and to test training concepts he must have free access to a full<br />

mission simulator and to helicopters. Availability and flexible<br />

use of both these types of systems under varying experimental<br />

conditions are extremely limited due to severe operational<br />

restrictions.<br />

2.3. Further Tasks Of Aviation Psychology<br />

The final success of the screening methods used in aviation<br />

psychology also depends on the following aspects:<br />

t training provided to instructors<br />

t prophylactic stress prevention anii coping programs<br />

+ analysis of pilot behavior displayed in the cockpit.<br />

From all these tasks one can conclude that the aviation<br />

psychologist needs a simulator-like system the primary asset of<br />

which is the capability to simulate psychological demands rather<br />

than maximum realism in terms of aircraft control .<br />

3. Solution<br />

The solution to the problems arising from pilot candidate<br />

selection, flight -training, ergonomics and psychological support<br />

must be looked for in the cockpit, However, the cockpits of full<br />

mission simulators or actual aircraft are not suited for the<br />

purposes of a scientific analysis for various reasons. This means<br />

that an aviation psychologist has to develop his own cockpit<br />

optimized for his specific aims. Such a concept will be described<br />

in the following paragraphs: Tile procurement of this system called<br />

"Aviation Psychological System/Helicopter (APS/H)" for the Germati<br />

Army Aviation Branch has already beer1 initiated.<br />

3.1. Task Structures<br />

The task structures of the APS/H are based or1 psychic problems<br />

that typically arise in the course of actual training missions:<br />

3.1.1. Problem Area: Psychomotor Control<br />

The intricacies of controlling an aircraft may lead to psychomotor<br />

problems . The APS/H will therefore be equipped with all the<br />

controls that can be found in a helicopter. Control inputs made<br />

with the APS/H controls will not only convey the same fi:~;iing iis<br />

those of a real helicopter, it \qill also be possible to c!:ange Lli~<br />

control st?nsitivity over a wide j*ringe.<br />

182


3.1.2. Problem Area: Handling Complexity<br />

Sophisticated helicopter cockpits give rise to individual handling<br />

difficulties. It will therefore be possible to analyze these<br />

difficulties by switching displays and control elements on and off<br />

and thereby vary handling complexity.<br />

3.1.3. Problem Area: Mission Demands .<br />

Certain flight missions and the demands that go with them may<br />

overtax the pilot mentally or physically. The APS/H therefore will<br />

have the capability to simulate all individual tasks and<br />

requirements pilots ha$e to fulfil during typical missions.<br />

3.1.4. Clinical Aviation Psychology i<br />

The special mission requirements inherent of flying military<br />

helicopters may also push experienced pilots to their performance<br />

limits. The APS/H is therefore designed such that flying-related<br />

requirements and psychotherapeutical measures (e.g.<br />

desensitisation, autogenic training etc.) can be combined with one<br />

another.<br />

3.1.5, Aviation Psychological Research I t,:<br />

It is a matter of course that designers of sophisticated equipment<br />

develop special test set-ups to be applied throughout the test<br />

phase in order to analyze and evaluate the operating performance<br />

of the device under development. APS/H will be a similar tool<br />

which will not only be used for solving ergonomic problems but<br />

also for<br />

- developing training systems and methods, and for<br />

- testing crew concepts.<br />

3.2. Realism Requirements For The APS/H<br />

Simulator realism is not an end in itself. An improvement in<br />

realism does not automatically improve simulator efficiency. An<br />

increase in realism will primarily go hand in hand with a linear<br />

increase in cost. So what is the degree of realism required for<br />

the APS/H to ensure maximum efficiency?<br />

3.2.1. lYotor Realism<br />

Operation of the controls, latency between control input and<br />

instrument display need to be as close to reality as possible<br />

since automatic handling patterns internalized by the pilots would<br />

make it difficult to overcome the effects of negative transfer.<br />

183<br />

!


3.2.2. Motion Realism<br />

Motion is essentially perceived by the visual organs. APS/H can<br />

therefore do without a motion system. Nevertheless, vibrations<br />

typical of helicopter flying will be generated by Pitting a<br />

vibration device to the pilot seat.<br />

3.2.3. Visual System Realism<br />

The APS/H needs a visual system for the following purposes:<br />

+ flight attitude-related visual feedback<br />

+ visual cueing for landing approaches<br />

+ projection of obstacle images for nap-of-the-earth flying<br />

t projection of images of prominent terrain features for visual<br />

navigation.<br />

All these images will be schematic in nature. It should be clear<br />

that no gain will be made in dealing with the above-mentioned<br />

tasks.by adding the image of leaves to the trees simulated.<br />

3.2.4. Acoustic Realism<br />

Realism in motion cueing will be enhanced by a realistic<br />

simulation of environmental sound patterns. This creates the need<br />

for an acoustic system with duemy head microphone quality via a<br />

head set.<br />

All in all, a detailed analyzis shows that the level of realism<br />

required for the APS/H need not be extraordinarily high to serve<br />

its purpose. Especially in the field of visual systems design,<br />

schematic images will do and thereby reduce overall costs.<br />

4. Summary<br />

Conventional test methodes (paper/pencil etc.) are firmly<br />

established tools to be applied in all phases of pilot candidate<br />

selection processes, but it should be borne in mind that their<br />

validity is limited and cannot be improved considerably as can be<br />

seen from the experiences gained, An in-depth analysis of the<br />

psychic potential required for meeting flying demands presupposes<br />

the esistencc of methods that are in keeping with real flyilhg<br />

demands and, additionally, permit the application of scientificexperimental<br />

criteria. When one loolcs at physicists who, in,sekrch<br />

of minute particles, venture to demand equipment of inconceivable<br />

dimensions and are actually provided with it, then it is justified<br />

to say that the outlined APS/H designed to study behavioral<br />

Patterns of helicopter pilots is fairly modest demand.<br />

184


Analyzing User Interactions<br />

With Instructional Design Software<br />

J. Michael Spector<br />

Daniel J. Muraida<br />

Air Force Human Resources Laboratory<br />

Brooks AFB, TX 78235-5601<br />

Abstract<br />

Many researchers are attempting to develop<br />

automated instructional design systems to guide subject<br />

matter experts through the courseware authoring<br />

process. What appears to be lacking in a number of<br />

existing research and development efforts, however, is<br />

a systematic method for analyzing the interplay between<br />

user characteristics, the authoring tool's structure<br />

and organization, and the resulting quality of<br />

computer-based instruction (CBI). This paper describes<br />

the initial application of a particular approach that<br />

focuses on the analysis of inputs, processes, and<br />

outputs that occur in human-computer interactions (HCI)<br />

between end users and a prototype of a CBI design tool.<br />

Instructional Systems Design (ISD) is an established process<br />

for designing and developing instructional materials. ISD models<br />

were first elaborated in the 1950's using a behavioral learning<br />

paradigm and have since undergone many revisions and refinements<br />

(Andrews & Goodson, 1980). Traditionally, ISD has,been viewed as<br />

the practical application of knowledge about learning and tasks<br />

to be learned to the design of instruction (Gagne, 1985).<br />

Many Researchers have pointed out the need to provide an<br />

update of ISD based on the findings of cognitive science<br />

(Tennyson, 1989). What is also needed is an update of ISD that<br />

takes into account computer-based interactive methods for<br />

presenting instruction (Muraida, Spector, & Dallman, 1990).<br />

Using computers to design, develop, and deliver instruction<br />

complicates ISD considerations. Some instructional strategies<br />

appropriate for certain classroom-based settings are not<br />

appropriate for certain computer-based settings. For example,<br />

some common classroom strategies involve the teacher making<br />

provocative statements and asking leading questions. Likewise,<br />

it is possible to construct alternate computer models of various<br />

devices and simulate their performance; this is not easily<br />

possible in a classroom. As a result, instructional strategy<br />

differences exist between classroom and computer settings.<br />

In addition, the design of computer-based instruction (CBI)<br />

185<br />

.


must be accomplished with great care. In a classroom, there is<br />

usually an alert and experienced teacher to compensate for<br />

unclear or inadequate instructional presentations. In a computer<br />

setting, it is essential that the initial instructional be clear:<br />

otherwise, the instruction is likely to fail. Courseware is<br />

computer software that is designed for instructional purposes.<br />

Courseware that is not carefully designed is most likely to be<br />

expensive and ineffective (Jonassen, 1988). As a consequence, to<br />

make optimal use of CBI it will be necessary to develop<br />

techniques for evaluating the success and efficiency of various<br />

ISD methodologies applied in computer-based settings.<br />

Problem<br />

CBI has proven to be an appropriate instructional solution<br />

in many settings (Hannafin and Peck, 1988). CBI has also proven<br />

to be expensive and often ineffective (MacKnight 61 Balagopalan,<br />

1989). What is needed, then, is a means to insure that CBI<br />

course designs are effective and produced in a cost-effective<br />

manner.<br />

There are two aggravating factors to this problem: 1) It is<br />

often true that courseware developers have had no special<br />

training in computer-based methodologies, and 2) It is not<br />

completely clear what cognitive aspects of learning are best<br />

instructed using various computer-based methodologies. In short,<br />

in determining how to optimize CBI developments it will be<br />

necessary to determine how novice and experienced CBI developers<br />

interact with the courseware authoring environment, and it will<br />

also be necessary to evaluate the success of the resulting<br />

courseware.<br />

The methodology proposed below represents an attempt to<br />

build an initial model of CBI authoring that can eventually be<br />

used as a predictor of success when combining particular<br />

courseware authoring environments, CBI developers, subject<br />

matter, and student populations. The Air Force Human Resources<br />

Laboratory (AFHRL) is interested in refining this model in order<br />

to evaluate the usability of transaction shells (Merrill, Li, &<br />

Jones, 1990) in the Advanced Instructional Design Advisor (AIDA),<br />

an automated and integrated set of tools to facilitate and guide<br />

the process of developing effective courseware (Muraida t<br />

Spector, 1990).<br />

The AIDA project focuses on the design and development of<br />

CBI (Spector, 1990). It is assumed that the Air Force will<br />

continue to expand its use of CBI, that the Air Force will<br />

continue to experience a shortage of courseware authors with<br />

backgrounds in instructional technology, and that the subject<br />

matter of immediate interest is maintenance training for<br />

apprentice level maintenance personnel.<br />

To provide CBI design guidance consistent with these<br />

assumptions, AFHRL has decided to pursue the use of intelligent<br />

186


lesson templates. Intelligent lesson templates have preestablished<br />

instructional parameters and are executable upon<br />

input of informational content by a subject matter expert. In a<br />

sense, intelligent lesson templates l'know how" to present the<br />

kind of instruction they contain. Experienced instructors can<br />

alter the instructional parameters in order to customize<br />

instruction. The most noteworthy intelligent lesson templates<br />

are Merrill's transaction shells (Merrill et al., 1990).<br />

AFHRL and Merrill signed a Memorandum of Agreement wherein<br />

Merrill loaned two transaction shells to AFHRL for purposes of<br />

evaluation. AFHRL is using these transaction shells to develop a _<br />

model of CBI authoring interactions that affect the productivity<br />

and the quality of developed CBI courseware.<br />

Methodology<br />

The purpose of the initial evaluation study of Merrill's<br />

transaction shells was to develop a working model of user<br />

interactions with instructional design software. In addition to<br />

determining if Merrill's transaction shells with particular user<br />

interfaces were worthy of refinement and continued development,<br />

the aim was to establish an initial model with relevant<br />

characteristics that predict user success with other authoring<br />

environments.<br />

The answer to the question about the value of using<br />

transaction shell technology is that transaction shell technology<br />

appears to provide a very usable and productive courseware<br />

authoring environment. Details are elaborated in subsequent<br />

sections of this report.<br />

The primary question, however, concerned the establishment<br />

of a model of courseware authoring interactions that would<br />

influence the productivity and quality of a CBI authoring<br />

environment. Because all of the relevant characteristics were<br />

not known ahead of time, an approach that allowed iterative<br />

refinement of a quantifiable and predictive model was required.<br />

Falk's soft modeling technique satisfied this requirement and was<br />

used to guide the design of the study (Falk, 1987).<br />

The initial phase of developing a soft model consists of<br />

identifying inputs, processes, and outputs that are relevant to<br />

the task being modelled. Weighted links between input and<br />

process measurements and output measurements are then<br />

hypothesized. Additional subjects are then tested using the<br />

proposed tentative model. The model and its associated measures<br />

and weights are modified to reflect the outcome of new subjects.<br />

New input, process, or output measurements may be added as deemed<br />

necessary in the model development phase. Over time, the model<br />

stabilizes and can be used as a predictive or analytical tool.<br />

Initial input measures for this soft model included the<br />

187


following: instructional experience, subject matter experience,<br />

computer experience, and cognitive style. Some of this data was<br />

gathered by direct questioning and was easily quantified (e.g.,<br />

number of years of teaching experience). Cognitive style was<br />

determined by questioning and by observation, and was not as<br />

easily quantified. Some aspects of computer experience were<br />

easily determined by questioning (e.g., number of computer<br />

courses taken), but other aspects were not as straightforward<br />

(e.cbI level of expertise with an operating system).<br />

Initial process measures for this soft model included the<br />

following: time spent on an authoring event, sequence chosen for<br />

authoring events, number of revisions attempted and accomplished,<br />

and purpose of revisions. Again some of these processes were<br />

easy to measure and to quantify, but other processes were more<br />

difficult to assess. For example, it was easy to measure how<br />

long an author spent indicating the particular function of a<br />

device that was part of the lesson content. However, determining<br />

the purpose of a particular revision without interrupting the<br />

integrity of the authoring process was more difficult. The only<br />

way to accomplish this was to note that a revision had been made,<br />

look at the revision, if its purpose was not obvious (e.g.,<br />

correcting a misspelled word was an obvious revision), then the<br />

author was asked after the session about the purpose of the<br />

revision.<br />

Initial output measures for this soft model included the<br />

following: total time to produce the lesson module, total cost<br />

to produce the lesson module, student achievement on tests,<br />

retention, student motivation concerning the material, level of<br />

interactivity of the lesson, instructor motivation to use the<br />

authoring environment in the future, and peer review by other<br />

instructional developers. Once again some of these measures are<br />

direct and straightforwardly quantifiable (e.g., total<br />

development time, student scores, etc.), while some are indirect<br />

and more qualitative (e.g., instructor and student motivation).<br />

The initial subject was observed completing a lesson module<br />

to teaching the names, locations, and functions of 125 parts in<br />

the T-37 cockpit. Subject's experience was determined in an<br />

extensive interview prior to the study. Subject's motivation was<br />

observed throughout the study. In addition, the subject was<br />

queried midway through the study concerning his progress and<br />

problems encountered. The subject also kept a diary of authoring<br />

events, including problems encountered and general impressions.<br />

Results<br />

The relevant input measures of the subject were as follows:<br />

1) Medium instructional experience, 2) High subject matter<br />

experience, 3) Low computer experience, and 4) Reflective<br />

cognitive Style with a self-directed locus of control. A formula<br />

for connecting each of these factors with output measures is<br />

currently being developed and will be tested in the second<br />

_ _ ^ -..-.-.-<br />

188


iteration of the evaluation study.<br />

The relevant process measures were as fOllOWS: 1) 4.75<br />

hours in introductory exercises, 2) 14.25 hours in on-line<br />

authoring, 3) 11.83 hours in off-line design and planning, 4) 10<br />

groupings, nested 3 levels deep, with a total of 21 lesson<br />

modules, top level module completed first, teaching 125 parts, 5)<br />

20 picture files identified and utilized, with minor revisions<br />

requested for 4, 6) Approximately two minor revisions per module,<br />

7) Approximately 5 minutes of debugging per individual module,<br />

and 8) Complete linkage of all modules into a course module in 20<br />

minutes. This data was collected by observation. The software<br />

has since been modified to collect and record this data<br />

automatically (Canfield & Spector, 1990).<br />

The relevant output measures were as follows: 1) 30.83<br />

hours in total development time (graphics were produced by<br />

support personnel and graphic production time is not included),<br />

2) 3 plus hours expected for student instructional time, 3) cost<br />

data not available, 3) student scores and motivation not<br />

available, 4) medium level of interactivity, 5) high instructor<br />

motivation (wants to be included in follow-on studies, and 6)<br />

acceptable.guality of courseware (will be administered to cadets<br />

in lieu of current instruction).<br />

The subject's diary and responses to interview questions<br />

indicated a sustained high level of motivation and satisfaction<br />

with the authoring tool in spite of known deficiencies<br />

(occasional mouse failures). The subject experimented with<br />

default instructional parameters during the exercises but rarely<br />

changed the defaults for the instruction he developed. More<br />

specifically, the subject chose timed presentations for the<br />

student practice interaction rather than learner control. The<br />

subject also modified the default testing parameters to reflect 3<br />

samples per item instead of 2 and a criterion level of 75%<br />

instead of 90%. In addition, the subject altered allowable<br />

interactions per individual lesson as appropriate, which<br />

reflected complete understanding of the transaction shell<br />

environment.<br />

Conclusion<br />

This initial study prompted the addition of automatic data<br />

collection for both instructors and students to the transaction<br />

shell software. The general results indicate a high level of<br />

acceptability and productivity using transaction shells to author<br />

courseware. Assessment of the quality of the CBI produced has<br />

yet to be completed, although initial data collection on student<br />

performance is underway.<br />

Initial indications are that students require in excess of<br />

3 hours to complete the course module. This means that the<br />

subject's development time to instruction time ratio using this<br />

tool was approximately 1O:l. Using traditional authoring tools<br />

189<br />

_


for this type of material (ignoring the time to create graphics)<br />

woul _~.-d have involved a 2OO:l development to instruction time ratio<br />

(Lippert, 1989). Both the tool and the model are worth refining.<br />

References<br />

Andrews, D. H. & Goodson, L. A. (1980). A comparative analysis<br />

of models of instructional design. Journal of Instructional<br />

Desicrn, 3(4), 2-16.<br />

Falk, R. F. (1987). A Primer for Soft Modelinq. Berkeley, CA:<br />

University of California Institute for Human Development.<br />

Gagne, R. M. (1985). The Conditions of Learnins and Theory of<br />

Instruction. New York, NY: Holt, Rinehart, and Winston.<br />

Jonassen, D. H. (Ed.) (1988). Instructional Desians for<br />

Microcomputer Courseware. Hillsdale, NH: Lawrence Erlbaum<br />

Associates.<br />

Hannafin, M. J. & Peck, K. L. (1988). The Desisn, DeveloDment,<br />

and Evaluation of Instructional Software. New York: NY:<br />

Macmillan Publishing Company.<br />

Lippert, R. C. (1989). Expert systems: Tutors, tools, and<br />

tutees. Journal of Computer-Based Instruction, 16(l), ll-<br />

19.<br />

MacKnight, C. B. and Balagopalan, S. (1989). Authoring systems:<br />

Some instructional implications. Journal of Educational<br />

Technoloav Systems, 17(2), 123-134.<br />

Merrill, M. D., Li, Z., & Jones, M. C. (1990). Second generation<br />

instructional design (ID2). Educational Technoloav, 30(2),<br />

7-14.<br />

Muraida, D. J. & Spector J. M. (1990). The advanced<br />

instructional design advisor (AIDA): An Air Force project<br />

to improve instructional design. Educational Technolosv,<br />

30(3), 66.<br />

Muraida, D. J., Spector, J. M., & Dallman, B. E. (1990).<br />

Establishing instructional strategies for advanced<br />

interactive technologies. Proceedinss of the 12th Annual<br />

Psvcholosv in the DOD Symposium, 12(l), 347-351.<br />

Spector, J. M. (1990). Desianinq and Develonins an Advanced<br />

Instructional Desiqn Advisor (Technical Report AFHRL-TP-90-<br />

52). Brooks AFB, TX: Training Systems Division.<br />

Tennyson, R. D. (1989). Cosnitive Science Undate of<br />

Instructional Svstems Desian Models (AFHRL Contract NO. F-F-<br />

F3365-88-C-0003). Brooks AFB, TX: Training Systems<br />

Division.<br />

L.?-. -.- -. -.<br />

190


MILITARY TWTING ASSOCIATION<br />

iYH) Annuiil timkrcnw<br />

FORECASTING TRAINING EFFECTIVENESS (FORTE)<br />

Mark G. Pfeiffer and Richard M. Evans<br />

.--- .-- - -- -<br />

Naval Training Systems Cent.er and Training Ferformance Data C:ent.er<br />

Orlando, FL<br />

A model was developed to simulate a variety of aviation training device<br />

evaluation outcomes. This simulation model is designed to explore sources<br />

of error threatening the sensitivity of device evaluations. Selection of ._<br />

evaluation designs is guided by a model that elicits information from<br />

experienced flight instructors. This practical knowledge is transformed<br />

into data that are used i,n simulating a training effectiveness evaluation.<br />

Effects of variables such as instructor leniency, task difficulty, and<br />

student ability are estimated by two different methods. Available in the<br />

output is an estimate of transfer ratios based on trials-to-mastery, a<br />

diagnosis of deficiencies, an exploration of possible sources of variance,<br />

and an estimate of statistical power and required sample size. Finally, all<br />

data analyses can be accomplished in less than 2 man-days and prior to the<br />

actual field experiment. Estimates of accuracy, reliability, and validity<br />

of the model are high and in an acceptable range.<br />

Backwound<br />

Major sources of error variance that can mask the true contribution of<br />

a training device to training effectiveness include instructor leniency,<br />

student ability, and task difficulty (McDaniel, Scott & Browning, 1983).<br />

First, instructors' grades are often unreliable criterion measures. Next,<br />

individual abilities among students vary widely. Finally, tasks vary<br />

greatly in difficulty level. Some tasks can be mastered by students in one<br />

or two trials, while others may require 30 trials. These sources of<br />

variance make ratings of students' performance insensitive measures of<br />

training device effectiveness. However, the magnitude can be identified<br />

with sensitivity analysis prior to actual field experiments.<br />

Sensitivity Analysis<br />

Sensitivity analysis is a planning technique (Lipsey, 1983) which<br />

focuses on the impact of variance on variables of interest. The device<br />

evaluation must be carefully planned if the results are to have practical<br />

value and show a true difference between experimental and control groups.<br />

During the planning phase for device evaluations an investment in time may<br />

help identify the problems that introduce unwanted error variance into the<br />

device evaluation. Performance data qenerated by flight instructors can be<br />

used for this purpose.<br />

The basic framework of the present "sensitivity" analysis differs from<br />

that described by Lipsey (1983) in that it employs the "insensitive"<br />

instructor's rating of students as a performance measure. Lipsey .wouid<br />

rather seek a more sensitive measure. While this rating measure may not be<br />

a particularly good psychometric measure, it is dictated by operational<br />

constraints. Instructors' ratings are used extensively in the transfer of<br />

training literature.<br />

“Approved for public release; distribution is<br />

unlimited.”<br />

191


SIMJLATION UOOEL<br />

The model described here is designed to simulate experimental dnd<br />

quasi-experimental training effectiveness evaluations of aviation devices.<br />

Values are generated by training experts. Major features of the model<br />

include the following:<br />

. programmable for microcomputers<br />

. extendable to different transfer designs /<br />

. helpful in planning field experimental and quasi-experimental .<br />

evaluations of devices<br />

. possible data collection by computer or by questionnaire.<br />

Input to the model comes from the ratings made by flight instructors. These<br />

expert judges make estimates of trials-to-mastery needed in the airplane by<br />

replacement pilots with and without prior simulator training using different<br />

device features. Estimates are made by two different methods to permit a<br />

check on cross-method variance and rater reliability.<br />

VARIABLES<br />

In order to gain a perspective of the scope or size of the model it is<br />

helpful for the reader to examine the levels permitted for key variables.<br />

These are shown in table 1. The model is designed so that these 1 imits can<br />

be changed to fit a variety of evaluation designs (Pfeiffer & Browning,<br />

1984).<br />

Variable<br />

Treatment (Xl)<br />

(Experimental vs. Control)<br />

Student Ability (X2)<br />

(Fast-Average-Slow)<br />

Task Difficulty (X3)<br />

(Easy-Average-Tough)<br />

Instructor Leniency (X4)<br />

(Easy-Average-Tough)<br />

Table 1<br />

Model Limits<br />

192<br />

Levels Permitted<br />

I<br />

/<br />

I


DATA INPUT<br />

Two methods are provided for entering data: the interactive method and<br />

the additive method. The data from both interactive and additive methods<br />

are compatible with the following evaluation design: (Xl) treatment, (;:I<br />

student ability, (X3) task difficulty, and (X4) instructor leniency.<br />

combinations of two levels for Xl and three levels for X2, X3, and X4<br />

require 54 data elements.<br />

Interactive Method<br />

An expert is asked to estimate the trials required for a replacement<br />

pilot to achieve mastery in the aircraft for each set of training conditions<br />

listed in table 2. These estimates are made twice: first, for the<br />

experimental group (e.g., with prior simulator training) and second for the<br />

control group (e.g., without simulator training). Training conditions and<br />

the data collection instrument for the interactive method are illustrated<br />

below as table 2.<br />

Table 2<br />

Interactive Questionnaire Instrument for Estimating Trials-to-Mastery<br />

ESTIMATED<br />

CONDITION INSTRUCTOR STUDENT TASK TRIALS<br />

1 Easy Fast Easy<br />

2 Easy Fast Tough<br />

3 Easy Slow Easy ,_<br />

4 Tough Fast Easy<br />

5 Easy Slow Tough<br />

6 Tough Fast Tough<br />

7 Tough Slow Easy<br />

8 Tough Slow Tough<br />

The model calls for data on trials-to-mastery for the 27 combinations of<br />

conditions describing the experimental group and the 27 combinations of<br />

conditions describing the control group, a total of 54 conditions. Training<br />

experts need only estimate trials for eight conditions in each group, a<br />

total of 16 conditions. The remaining 38 values (representing the<br />

difference between 16 and 54) are estimated by a regression subroutine in<br />

the model.<br />

193<br />

.


Valuable time of experts is saved by having the model compute intermediate<br />

data elements.<br />

Paramters. The parameters identified in table 3 were selected to make<br />

the model flexible, i.e., capable of simulating conditions where the<br />

relative importance of the variables listed can be changed at will by the<br />

analyst. By using a computer terminal, the analyst may input alternative A,<br />

8, C, D, E, or F to establish the relative importance of the variables in<br />

determining expected trials-to-mastery. Relative importance of these<br />

variables is expected to vary from one aircraft community to another.<br />

,<br />

Table 3<br />

Parameters for Weighting Trials-to-Mastery<br />

Parameter Relative Importance<br />

Addftive Method<br />

A Instructors Students Tasks<br />

8 -Students Instructors Tasks<br />

Tasks Instructors Students<br />

s Instructors Tasks Students<br />

Students Tasks Instructors<br />

s Tasks Students Instructors<br />

The mean trials-to-mastery for the experimental and control groups,<br />

obtained by the interactive method, are used as a basis for the values used<br />

in the the additive method. Here the same expert is asked to estimate<br />

trials-to-mastery for each of the conditions one at a time. The questions<br />

are phrased as deviations around the mean trials-to-mastery (table 4).<br />

Training experts estimate six conditions in each group, a total of 12<br />

conditions. The remaining 42 values (representing the difference between 12<br />

and 54) are estimated by the computer model according to the rules of<br />

additive conjoint measurement (Lute & Tukey, 1964).<br />

Reljabilitv Check<br />

Since each training expert is asked for inputs to the model by two<br />

different methods, a check on methodological variance is possible by<br />

correlating the values obtained by the interactive and additive methods (N =<br />

54). This correlation fs computed across methods for experimental and<br />

control groups.<br />

SUMMtYOFMOOEL FLOU<br />

Input, output, and<br />

figure 1 and figure 2.<br />

interactive aspects of the mode1 are summarized in<br />

194<br />

4


10 WE 1ASI<br />

YlslRUc70n IS<br />

PROMPlED 10 SELECT-<br />

,<br />

INTERACllVE METHO<br />

t<br />

i<br />

EXPERIMENTAL GROUP<br />

I<br />

I<br />

I<br />

I<br />

TO BaSTER”<br />

CONTROL GROUP<br />

ADDITIVE MET”OD<br />

Figure 1. Model flow and data eethating procedure.<br />

ANALISIS OF EXPERWENT*L<br />

A”0 CONTROL OUWPS-<br />

. DlFFERENCLS<br />

. CORRELAllONS<br />

. TAAHIFER R*noS<br />

I<br />

,<br />

Flgure 2. Anelysie at � xperlmental and con?ruf group8 and data storage.<br />

195<br />

1


Table 4<br />

Additive Questionnaire Instrument for Estimating Trials-to-Mastery<br />

IF AN AVERAGE STUDENT REQUIRES *N* TRIALS TO LEARN TO<br />

MASTERY, HOW MANY TRIALS WILL A . . . FAST LEARNER REQUIRE?<br />

. . . SLOW LEARNER REQUIRE?<br />

IF AN AVERAGE INSTRUCTOR REQUIRES *N* TRIALS TO TRAIN<br />

STUDENTS, HOW MANY TRIALS WILL . . . AN EASY INSTRUCTOR NEED?<br />

.., A TOUGH INSTRUCTOR NEED?<br />

IF *N* TRIALS ARE NEEDED FOR AVERAGE TASKS, HOW MANY<br />

TRIALS WOULD... . . . AN EASY TASK REQUIRE?<br />

. . . A TOUGH TASK REQUIRE?<br />

VALIDATION AND APPLICATION<br />

The model was validated in the helicopter community using a concurrent<br />

validation design. Criterion data for the simulation were collected during<br />

an experimental evaluation of Device 2F64C, an SH-3 simulator located at the<br />

Naval Air Station, Jacksonville, Florida. Trials-to-mastery obtained from<br />

the simulation model were compared with the trials-to-mastery obtained from<br />

the field experiment (Evans, Scott & Pfeiffer, 1984).<br />

SUBJECTS AND PROCEDURE<br />

Thirteen flight instructors currently involved in training pilots in<br />

Device 2F64C were asked to estimate trials-to-mastery by two different<br />

methods. The subjects, one at a time, made their estimates at a computer<br />

terminal. One half-hour per subject was required to complete both the<br />

additive and interactive rating tasks.<br />

VARIABLES<br />

Four independent variables (shown following ) were included in the<br />

validation desi n: (XI) device feature, (X2) student ability, (X3) task<br />

difficulty and PX4)<br />

instructor leniency. All combinations of two levels for<br />

Xl and three levels for X2, X3, and X4 produced 54 data points for a<br />

re ression analysis against estimated trials-to-mastery. Trials-to-mastery<br />

(Yy in the<br />

aircraft was the dependent variable.<br />

EVALUATION DESIW SE#SITIVIM<br />

The usual purpose of a device feature evaluation is to extract the<br />

variance due to the device features, e.g., visual and motion vs. motion<br />

onlY.The modeled data can also be used to do a power analysis of the onetrial<br />

difference (actually A = 1.04) between device features. Power<br />

analysis provdes an estimate of the sample sizes needed to demonstrate that<br />

this one-trial difference (experimental mean = 4.61, SD = 1.83: control<br />

mean = 5.65, SD = 2.07) is reliable (pfeiffer, Evans & Ford, 1985).<br />

196


The linear model indicates that the smallest amount of variance is<br />

accounted for by device features (.07). The combined other sources of<br />

variance: Instructor leniency, student ability, and task difficulty,<br />

(.21+.27+.42*.90) are predicted to mask out the variance due to the device<br />

features. Evaluators could also artificially change their ratings to<br />

reflect the impact of anticipated evaluation design changes. A<br />

reexamination of summary statistics would permit evaluators to assess the<br />

impact of hypothetical design modifications on the anticipated outcome of<br />

the device evaluation.<br />

DISCUSSION<br />

Using data from a simulation model, the training effectiveness analysis<br />

estimated that the one-trial difference between training under the visual<br />

plus motion condition and motion alone would not be statistically<br />

significant with a reasonable sample size (Ott, 1977). This outcome ofothe<br />

model was confirmed through analysis of actual field data (Evans, Scott &<br />

Pfeiffer, 1984). With this insight, from the model, the evaluator of a<br />

.device would know in advance that control of task difficulty, student<br />

ability, and instructor leniency in a field experiment would be necessary to<br />

increase statistical power. True training effects attributable to the<br />

device features are more likely to be revealed when extraneous errors are<br />

controlled. Cochran and Cox (1957) have presented a theoretical discussion<br />

of this problem. Instructors' rating variance, for example, may be<br />

controlled by utilizing a standardized method for identifying when the<br />

student has achieved mastery (Rankin & McDaniel, 1980). Some criterion<br />

measure other than instructors' ratings could also be employed. A specific<br />

example is automated performance measurement on the tactical range, which<br />

unfortunately is not widely available for scientific measurement of ai,rcraft<br />

in free flight. However, performance measurement is available in flight<br />

simulators. Computer-aided techniques for providing operator performance<br />

measures have been provided by Connelly, Bourne, Loental and Knoop (1974).<br />

coNausION<br />

This study shows that flight instructors who have knowledge of a<br />

training situation . but who are not necessarily proficient with the<br />

intricacies of research design and statistics can provide data useful for<br />

planning a field experiment (device evaluation). The programs described<br />

herein are "user-friendly" and resident in a portable microcomputer. Should<br />

the computer be unavailable, a questionnaire could be used (Appendix). The<br />

utility of this approach depends, in part, on asking the right questions for<br />

a particular training environment and in part on developing the responses to<br />

such questions into meaningful information. The model just described has<br />

provided that utility for the present situation. Additionally, this model<br />

may be easily adapted to other training problems involving expert ratings(see<br />

Pf'eiffer and Horey, 1988).<br />

197<br />

_


REFERENCES<br />

Cochran, W. G., & COX, G. M. (1957). Exoerimental designs. New<br />

York: John Wiley.<br />

Connelly, E. M., Bourne, F. J., Loental, D. G., & Enopp, P. A.<br />

(1974). Comnuter-aided techniuues for orovidina operator<br />

performance measures. (AFHRL-TR-74 87). Dayton, OH:<br />

Wright-Patterson Air Force Base.<br />

Dawes, R. M. (1979). The robust beauty of improper linear models<br />

in decision making. American DSvChOlOUiSt, 34, 571-582:<br />

Evans, R. M., Scott, P. G., & Pfeiffer, M. G. (1984). SH-3<br />

helicoDter fliaht traininq: An evaluation of visual and<br />

motion simulation in Device 2F64C. (Technical Report 161).<br />

Orlando: Training Analysis and Evaluation Group, Naval<br />

Training Equipment Center.<br />

Lipsey, M. W. (1983). A scheme for assessing measurement<br />

sensitivity in program evaluation and other applied research.<br />

Psvcholoaical Bulletin, 94, 152-165.<br />

Lute, R. D., SI Tukey, J. W. (1964). Simultaneous conjoint<br />

measurement: A new type of fundamental measurement. Journal<br />

of Mathematical Psvcholoav, 1, l-27.<br />

McDaniel, W. C., Scott, P. G., t Browning, R. F. (1983).<br />

Contribution of nlatform motion simulation in SH-3 helicoDter<br />

pilot traininq. (Technical Report 153). Orlando: Training<br />

Analysis and Evaluation Group, Naval Training Equipment<br />

Center.<br />

Ott, L. (1977). An introduction to statistical methods and data<br />

analysis. North Scituate, MA: Duxbury Press.<br />

Pfeiffer, M. G., & Browning, R. F. (1984). Field evaluations of<br />

aviation trainers. (Technical Report 157). Orlando:<br />

Training Analysis and Evaluation Group, Naval Training<br />

Equipment Center.<br />

Pfeiffer, M. G., Evans, R. M., & Ford, L. H. (1985). Modelinq<br />

field evaluations of aviation trainers. (Technical Note l-85.<br />

Orlando: Training Analysis and Evaluation Group, Naval<br />

Training Equipment Center.<br />

Pfeiffer, M. G., t Horey, J. D. (1988). Forecastinu traininq<br />

device effectiveness: Three devices. (Technical Report<br />

88-028). Orlando: Naval Training Systems Center.<br />

Rankin, W. C., & McDaniel, w. c. (1980). Camouter aided traininq<br />

evaluation and scheduling (CATES) system: Assessinu fliuht<br />

task proficiency. (Technical Report 94). Orlando: Training<br />

Analysis and Evaluation Group, Naval Training Equipment<br />

Center.<br />

,...-.” .._<br />

198


Cost-Effectiveness of Home Study using Asynchronous<br />

Computer Conferencing for Reserve Component Trainings2<br />

Ruth H. Phelps, Ph.D.<br />

Major Robert L. Ashworth, Jr.<br />

U.S. Army Research Institute for the<br />

Behavioral and Social Sciences<br />

Heidi A. Hahn, Ph.D.<br />

Idaho National Engineering Laboratory<br />

Abstract<br />

The resident U.S. Army Engineer Officer Advance<br />

Course was converted for home study via asynchronous<br />

computer conferencing (ACC). Students and instuctors<br />

communicated with each other using computers at home,<br />

thus creating an ttelectronic classroom". Test scores,<br />

completion rates, student perceptions and costs were<br />

compared to resident training. Results showed that:<br />

ACC performance is equal to resident and costs are less<br />

than resident.<br />

Geographical dispersion, limited training time and civilian<br />

job and family demands make travel to resident schools for<br />

training and education difficult for the Reserve Component (RC).<br />

Not only is it a hardship for soldiers to leave jobs and family,<br />

but their units are unable to conduct collective training when<br />

soldiers are absent. In addition, training soldiers at resident<br />

schools has become so costly that HQ TRADOC has proposed a 50%<br />

reduction in the number of soldiers traveling to resident<br />

training by 2007 (TRADOC PAM 350-4).<br />

The purpose of this paper is to summarize an investigation<br />

of an alternative means for meeting the educational requirements<br />

of the RC. The goals are to (1) develop and test a new training<br />

option, using asynchronous computer conferencing (ACC), that<br />

1 These data are summarized from Hahn, H., Ashworth, R.,<br />

Wells, R., Daveline, K., (in preparation). Asynchronous<br />

Cornouter Conferencinq for Remote Delivery of Reserve<br />

Comnonent Traininq (Research Report). Alexandria, VA: U.S.<br />

Army Research Institute for the Behavioral and Social<br />

Sciences.<br />

2This paper is not to be construed as an official Department<br />

of the Army document in its present form.<br />

199


would not require soldiers to leave their homes and units and<br />

yet maintain the quality of training typically found at the<br />

branch school: (2) determine the cost-effectiveness of<br />

developing and operating the ACC alternative.<br />

Asynchronous computer conferencing is a means for<br />

communicating from different locations at different times (i.e.,<br />

asynchronously) using a computer network. For training<br />

purposes, an llelectronic classroom*‘ is established by connecting<br />

all students with each other and the instructional staff. A<br />

student or instructor can participate in the classroom from any<br />

location using existing telephone lines and a computerequipped<br />

with a modem. Students can work together in groups, ask<br />

questions of the instructors, tutor their classmates or share<br />

their thoughts and experiences. Instructors can direct<br />

individual study, conduct small group instruction, answer<br />

questions, give remedial instruction and provide exam feedback<br />

to the students.<br />

Participants<br />

Method<br />

Fourteen RC officers (13 males: 1 female) took Phase III of<br />

the Engineer Officer Advanced Course (EOAC) by ACC homestudy.<br />

For comparison purposes, performance data were collected from<br />

RC students taking the same course in residence at the U.S. Army<br />

Engineer School from October, 1986 to June, 1989.<br />

The instructional staff consisted of a civilian full-time<br />

course manager/administrator responsible for the overall<br />

operation of the course and three part-time instructors. The<br />

part-time instructor responsibilities included directing group<br />

discussions, remedial instruction and/or monitoring student<br />

progress.<br />

Course Descriotion<br />

Course materials consisted of Module 6 of the EOAC (66<br />

program hours of instruction). Media used included paper based<br />

readings and problems, computer-aided instruction, video tapes<br />

and computer conferencing discussion. Topics covered were Army<br />

doctrine (e.g., rear operations), technical engineering (e.g.,<br />

bridging, flexible pavements), leadership and presentation<br />

skills. The program of instruction was identical for the ACC<br />

and resident classes.<br />

200<br />

.


- ---..<br />

Eouinment, Procedure and Data Analysis<br />

- -<br />

Each student was provided with an IBM XT computer with 20<br />

megabyte hard disk, color monitor and printer. Software and<br />

courseware loaded on each computer consisted of: (1) a<br />

specially developed course management system and communications<br />

package; (2) computed-assisted instruction and tests; (3) word<br />

processing package; (4) spreadsheet.<br />

Communication software for asynchronous computer<br />

conferencing was provided through U.S. Army Forum, Office of the<br />

Director of the Army Staff. The host computer was located at<br />

Wayne State University and used the CONFER II conferencing<br />

software system. I<br />

The course was conducted from September, 1988 to April,<br />

1989. Students were mailed all their computer equipment with<br />

written assembly and operation instructions and course<br />

materials. In addition they were provided with a toll free<br />

"hot line" telephone number for resolving hardware/software<br />

problems. The first lessons to be completed were self-conducted<br />

and designed to familiarize the student with the operation of<br />

the computer and software. Scores for computer training were<br />

not included in overall course grades.<br />

Part-time instructional staff were provided the same<br />

equipment and software as the students. In addition they were<br />

given a 40 hour training course on operating the hardware/<br />

software, instructional responsibilities and<br />

teaching/motivational techniques. Instructional staff and<br />

researchers met together to conduct this training using a<br />

combination of lecture and hands-on practice with the computer.<br />

There were four types of data collected: (1) test,<br />

practical exercise and homework scores: (2) pre- and post course<br />

student perceptions of their amount of knowledge on the course<br />

topics: (3) course completion: (4) cost of converting and<br />

executing the course. Comparisons of the resident to the ACC<br />

course were made using multivariate analysis variance procedures<br />

for a two-group design.<br />

Results<br />

As shown in the top of Table 1, there was no reliable<br />

difference between the test scores of students in residence<br />

versus ACC. A comparison of the students' self ratings of their<br />

level of knowledge before and after the course, showed that the<br />

ACC group had significantly greater gains in their perceived<br />

amount of learning, as shown in the bottom of Table 1.<br />

Completion data showed that 95% of resident students completed<br />

the course compared to 64% of the ACC students.<br />

201<br />

-.


Table 1<br />

Student Scores and Ratinqs<br />

Scores Resident Sisnificance<br />

Tests<br />

Homework<br />

Practical<br />

Exercise<br />

92.0% 86.4% NS<br />

88.8% 92.0% NS<br />

90.4% 89.9% NS<br />

Perceived Amount 33% 12% PC.05<br />

Learned<br />

(% Post-Pre)<br />

Cost data were computed separately for (1) converting an<br />

existing course for delivery by ACC and (2) executing each<br />

iteration of the course. If the conversion were done<br />

by within-government staff, then the cost would be approximately<br />

$296,100. If it were done under contract, then the cost is<br />

estimated at $516,200. Start-up costs of equipment purchase and<br />

instructor training were estimated to be $73,100 for withingovernment<br />

and $96,000 for contractor. Costs that will recur<br />

with each iteration were estimated at $234,400 for withingovernment<br />

and $420,900 for contractor.<br />

SK<br />

so00<br />

5500<br />

5000<br />

4soo<br />

4cQo<br />

3500<br />

3004<br />

2500<br />

2000<br />

1SW<br />

loo0<br />

so0<br />

0<br />

Fiaure 1. Relative costs of EOAC alternatives over 10 course<br />

iterations.<br />

202


Figure 1 shows the total course conversion, start-up plus<br />

the recurring costs over 10 course iterations. Initially<br />

resident and ACC (within government) are similar with ACC<br />

(contractor) costs being nearly twice as much. However, when<br />

'the costs of conversion and execution are amortized, ACC<br />

(contractor) becomes less costly than resident training after<br />

four course iterations. After five iterations ACC (within<br />

government) would save 47% and ACC (contractor) would save 6%.<br />

Cost-effectiveness ratios were computed by combining the<br />

cost and completion rate data. The ratio was greatest for ACC<br />

using government staff (.64), second for resident training . . .<br />

(.41), and lowest for ACC using contractor staff (.36).<br />

‘% Discussion<br />

It has been shown in this report that there is a costeffective<br />

alternative to sending RC soldiers to branch schools<br />

for resident training. Training by ACC can be conducted just as<br />

effectively and for less money. Thus, this technology appears<br />

to meet the need of the RC to complete educational requirements<br />

from the home or homestation, without long absences from the<br />

unit. The llelectronic classroom" could be conducted remotely<br />

from existing educational institutions such as the branch school<br />

and/or the U.S. Army Reserve Forces School in order to maintain<br />

standardized instruction.<br />

Additional research is needed, however, to improve the<br />

completion rate for ACC home study. Reasons for dropping out of<br />

the experimental course were related to limited time due to<br />

competing activities such as civilian jobs and family. A means<br />

of predicting which soldiers are likely to succeed or drop out<br />

of home study will assist Army trainers in both selecting<br />

students and providing assistance for those at high risk.<br />

References<br />

U.S. Army Training and Doctrine Command. (1989). Army Training<br />

2007. (TRADOC Pamphlet 350-4). Ft. Monroe, VA: Author.<br />

203


TEST DESIGN AND MINIMUM CUTOFF SCORES<br />

Sandra Ann Rudolph, Training Appraisal<br />

Chief of Naval Technical Training<br />

INTRODUCTION<br />

It has become increasing obvious in the last few years that<br />

the United States government cannot continue to operate with little<br />

concern for who will pay the bill. The apparent message is to<br />

do better with less. This means we must become more efficient in<br />

our way of conducting business. For many of us--our business,<br />

is training. Being efficient means we must use our resources<br />

wisely for the purpose intended. In training our resources are<br />

numerous--training devices, curriculum, instructors --while our<br />

purpose is solitary--provide the training necessary for graduates<br />

to perform in the fleet. While performance is the key, there is<br />

background knowledge that is necessary for the trainee to grasp the<br />

performance.<br />

BACKGROUND<br />

In the training environment of yesterday, where money was no<br />

object, training was easier. There was little concern for statistical<br />

evaluation, effectiveness, or efficiency. We trained by<br />

the seat of our pants--experience wasn't the best teacher, it was<br />

the ONLY teacher. Today, lack of attention in these areas could<br />

mean loss of training dollars. One of the big areas of concern<br />

deals with attrition--or the dropping of trainees from,a designated<br />

training program. While there are many causes for attrition,<br />

recent attrition analysis visits to such schools as Air Traffic<br />

Control School, Music School, and Boiler Technician /Machinist Mate<br />

School, indicate that testing programs may be at the very core of<br />

many of our problems. The following questions were used to<br />

determine how knowledge testing was being used to measure success:<br />

(1) Have critical course objectives been identified with<br />

corresponding emphasis on testing?<br />

(2) Have the knowledge tests been designed to measure the<br />

objectives to the learning level required?<br />

(3) How was the minimum cutoff score for the knowledge<br />

tests determined?<br />

(4) Has the test design and cutoff score been validated?<br />

(5) Have alternate versions of the tests been developed<br />

that are consistent with the valid test design?<br />

It became apparent that testing was a problem. It was<br />

discovered that the emphasis and training had been placed on<br />

individual test-item development and test-item analysis, not on<br />

test development and test analysis. In other words, there was no<br />

assurance that the objectives were being tested nor any evidence<br />

on how the cutoff score was determined. To standardize the<br />

approach to test design, the following process was established:<br />

204


(1) Determine criticality of the objectives.<br />

(2) Determine test design.<br />

(3) Establish a minimum cutoff score.<br />

(4) Validate the test.<br />

Criticalitv of the obiectives<br />

DISCUSSION<br />

The objectives of a course are those behaviors the trainee is<br />

expected to exhibit upon completion of training. Regardless of the<br />

method of development, objectives are established with var.ying ._<br />

degrees of importance or criticality. Therefore determining the<br />

importance of the objectives must occur prior to designing the<br />

test. While there is not an established set of procedures to<br />

determine criticality, 'the following examples have proven to be<br />

valid.<br />

(1) Rank orderins of obiectives. Subject matter experts rank<br />

the objectives from the most important to least important. This<br />

method is most useful when courses have a few number of objectives.<br />

(2) Yes or No. Subject matter experts determine critical-ity<br />

by responding ltYesl' or *rN~ll. The greatest disadvantage to this<br />

approach is that some critical objectives are more critical the<br />

others and visa versa.<br />

(3) Criticality based on a scale rankinu. This method uses<br />

a set of questions to guide in determining criticality.<br />

(a) How important is this behavior to successful<br />

performance in the fleet?<br />

(b) How difficult is the behavior to learn?<br />

(c) How important is the behavior to successful<br />

performance in the course?<br />

A scale is normally established as O-5 or O-10. Based on the<br />

above or similiar questions, each objective is reviewed by subject<br />

matter experts; a number value assigned and the average calculated.<br />

The objectives are then ranked. Objectives falling above the<br />

established cutoff are considered critical. The cutoff score is<br />

normally a number based on the scale used. For example, any<br />

objective ranked 3 or above on a O-5 scale might be considered<br />

critical. This number will vary and is based on the individual<br />

course and its mission. This method provides the most complete way<br />

to determine criticality. The disadvantage is that<br />

it may be complicated and time consuming.<br />

205


Test Design<br />

As with any research project, the researcher must have a plan.<br />

Without this plan, the researcher would be looking for information<br />

with little or no direction. The test design is a plan for<br />

ensuring that the objectives are tested and a plan for measuring<br />

the student's success in accomplishing the objectives.<br />

The process of designing a test builds upon the previous step<br />

of determining criticality of the objectives.<br />

There is no proven scientific method to determine the exact .<br />

test design. 'It is an opinion based on experience. This opinion<br />

can be strengthen through consensus. Therefore the design must be<br />

based on the ideas of several subject matter experts and not one or<br />

two individuals. If a consensus cannot be reached, then an average<br />

should be taken. Consensus should be an underlying concern<br />

throughout the test design process. Consensus of the right persons<br />

improves the chances of producing a valid test.<br />

Sten One. Group the objectives in the order in which they<br />

will be tested. Factors to consider are:<br />

(1) The difficulty of the material needed to accomplish the<br />

objective.<br />

(2) The length of the material needed to accomplish the<br />

objective.<br />

For more difficult material, fewer objectives should be<br />

grouped for testing purposes. For example, an objective that is<br />

very difficult to accomplish may require individual testing, while<br />

several simpler objectives may be tested together. The longer it<br />

takes to teach the objective, the fewer objectives should be tested<br />

together. For example, an objective that is taught in three days<br />

may ,require individual testing while the objective that is taught<br />

in three periods may be tested with other objectives.<br />

Step Two. Determine the number of test items per objective.<br />

The concern is to have enough test items on a test to ensure the<br />

measurement of each objective. Several factors to consider are:<br />

(1) Criticalitv of the obiective. The more critical the<br />

objective, the more items may be required.<br />

(2) Tvne of obiective tested. If the test is comprised of<br />

both critical and noncritical objectives, normally the critical<br />

objectives should contain more items. The more items asked, the<br />

more confident that the trainee has grasped the objective.<br />

(3) Number of obiectives tested. If the test contains<br />

several objectives, be aware of total number of items on the test<br />

and the time constraints.<br />

206


(4) Length of the material tested. If an objective can<br />

be taught in three periods, it should require fewer test items<br />

than the objective that is taught in three days.<br />

(5) Difficultv of the material. When the material is very<br />

difficult it may require fewer items written to a much m o r e<br />

difficult level.<br />

Step Three. Determine the level of learning of the test<br />

items. Depending on the status of the curriculum, test items may<br />

or may not already be available. While several levels of learning<br />

exist, the following five levels are suggested for use: . _<br />

(1) Knowledcte. Test items that measures a student's<br />

ability to identify or recall specific terms, facts, rules, etc.<br />

as they are taught. Knowledge represents the lowest level of<br />

learning for a test item.<br />

(2) Comprehension. Test items that measure a student's<br />

ability to grasp the meaning of material. This may be done by<br />

interpreting, explaining, or translating information. This is a<br />

higher level of learning than knowledge, but the lowest level of<br />

understanding.<br />

(3) Amlication. Tests items that measure the student's<br />

ability to use learned material in new and concrete situations.<br />

This type of test item requires a higher level of understanding<br />

than comprehension.<br />

(4) Analysis. Test items that measure the student's<br />

ability to break down material into components so that an<br />

organizational structure may be understood. This may require the<br />

identifcation of parts, analysis of relationships between parts,<br />

and recognition of the organizational principles involved. These<br />

types of test items represents a higher level of learning than<br />

comprehension and application because they require an<br />

understanding of both the content and the structural form of the<br />

material.<br />

(5) Evaluation. Test items that measure the student's<br />

ability to judge the value of material for a given purpose. The<br />

judgements are based on definite criteria. This type of test<br />

item represents the highest learning level because it contains<br />

all the other categories.<br />

When determining the learning level that the test item<br />

should be written to, the objective must be reflected. The<br />

following factors should be considered:<br />

(1) Test items must be written that support the objective.<br />

This means that if the objective calls for a basic knowledge of<br />

the material, the test items should be written to the knowledge<br />

learning level.<br />

207


- __-.-. -.. .~ .-... _ .._...<br />

‘(2) If the objective calls for an understanding of the<br />

material, then the test item should be written to one of the<br />

higher learning levels.<br />

(3) If the objective calls for a higher learning level<br />

not all test items should be written to the highest level.<br />

Enough must be on the test to ensure that the student has met the<br />

objective to the learning level required.<br />

Step Four. Select appropriate test items from the test bank<br />

or develop test items. If a test bank is already in existence,<br />

each item must be cross-referenced to the objective it<br />

supports and a level of learning identified.<br />

If the test bank<br />

does not have an adequate number of items, new items may be<br />

required. If it appears that new items that support the objective<br />

are difficult to prepare, the plan may need to be altered.<br />

Stels Five. Establish a minimum cutoff score. Setting a cutoff<br />

score means that a point must be determined that differentiates<br />

between the student that has achieved the objective and the student<br />

that has not. If the first four steps have been followed, it is<br />

safe to assume that the test has content validity. If there is any<br />

doubt, the test should be reviewed again before attempting to<br />

establish the minimum cutoff score.<br />

Setting a cutoff score, as with the other steps is a judgemental<br />

process. While several methods of establishing the minimum<br />

cutoff score exist, the following methods are suggested.<br />

METHOD 1<br />

(1) A panel of subject matter experts are selected based on<br />

their current knowledge of the job and the performance required<br />

of the graduate in the fleet.<br />

(2) A discussion should be conducted centering around what<br />

is the minimally competent person. Caution should be taken not to<br />

allow one person to dominate the discussion and that the goal<br />

should be one of consensus. The discussion is designed to get all<br />

members thinking along the same lines.<br />

(3) Next, the technique of establishing the cutoff score<br />

should be explained.<br />

(a) Review each test item on the test.<br />

(b) Check items that the student with minimum<br />

competency should be expected to know. Care<br />

should be taken that this is not what the average<br />

student will know, or what the subject matter<br />

expert would like for them to know.<br />

(c) If there are any items that the student must know,<br />

these items will be noted.<br />

(d) Add the number of checks for each objective.<br />

208<br />

.


(e) Average the total responses and this becomes the<br />

minimum cutoff score for the objective.<br />

METHOD 2<br />

(a) With this method, subject matter experts determine<br />

the percentage of the students that should answer<br />

the test item correctly. Again this is dealing<br />

with minimum competency.<br />

(b) An average of the percentages yields the minimum<br />

cutoff score.<br />

Regardless of the method used, there is never any hard and<br />

firm criteria for what is competency and what it is not. Some<br />

students are clearly competent based on their scores. Some<br />

students are clearly not competent based on their scores. There<br />

is a certain group of students that may meet the cutoff score and<br />

not be competent. There is normally an equal number of students<br />

that do not meet the cutoff score that are competent.<br />

In the final analysis of the cutoff score, it comes to a<br />

decision concerning which is the greater danger; to fail a<br />

qualified person or to pass an unqualified person. For progress<br />

tests, it is probably alright to pass an unqualified person. For<br />

exit exams, particularly when safety is a factor, it is probably<br />

better to fail a qualified person than to pass an unqualified<br />

person.<br />

Step Six. Validation process. Content validity has already<br />

been established. Validating the minimum cutoff score is a process<br />

achieved over time by administering the test and plotting the<br />

scores. If the scores indicate that most all students are passing,<br />

the cutoff score may be too low. This is true only if non<br />

competent students are passing. If all the students who pass are<br />

competent, then the cutoff score may be acceptable. If the scores<br />

indicate that most students fail, the cutoff score may be too high.<br />

SUMMARY<br />

In conclusion, the process is being tested. Training has been<br />

provided to all the sites where attrition analysis visits have been<br />

conducted. Since the training is recent, it is difficult to assess<br />

what impact the process has had on attrition. While attrition has<br />

been lowered in each case, it is not possible to pin point any<br />

specific cause. One thing we feel confident with is that this<br />

process leads to better test validity and that the objectives are<br />

being measured to the degree specified.<br />

REFERENCES<br />

Grondlund, Norman (1985). Measurement<br />

MacMillian Publishing,-New York p. 515.<br />

209<br />

. .


_-.-.<br />

Subjective and Cognitive Reactions to Atropine/2-PAM,<br />

Heat, and BDU/MOPP-IV<br />

John L. Kobrick, Richard F. Johnson, and Donna J. McMenemy<br />

US Army Research Institute of Environmental Medicine<br />

Natick, Massachusetts 01760-5007<br />

The current US armed forces nerve agent antidote is a<br />

combination of 2 mg atropine sulfate (atropine) and 600 mg<br />

pralidoxime chloride (2-PAM) administered by paired intramuscular<br />

injections. Although these drugs provide good physical<br />

protection, they have side effects which could lead to adverse<br />

subjective reactions and Impaired performance (Taylor, 1980).<br />

The major physiological reactions to atropine alone<br />

(Marzulli & Cope, 19501, and to atropine in combination with heat<br />

stress (Kolka, Stephenson, Bruttig, Cadarette, & Gonzalez, 1987)<br />

have been identified. Effects on psychological, perceptual, and<br />

cognitive behavior are less clear, although some performanceoriented<br />

studies have been reported (Baker, et al., 1983; Moylan-<br />

Jones, 1969; Penetar & Henningfield, 1986; Wetherell, 1980). The<br />

physiological effects of P-PAM alone and in combination with<br />

atropine have also been studied (Holland, Kemp, & Wetherell,<br />

19781. Much less is known about associated psychological and<br />

perceptual effects (Headley, 19821, although such knowledge is<br />

essential in view of their paired use as the standard nerve agent<br />

antidote.<br />

Chemical warfare in tropic and desert areas also creates<br />

problems due to heat stress, especially when troops must wear<br />

MOPP-IV chemical protective clothing, since the total<br />

encapsulation of that ensemble traps heat and body moisture.<br />

This paper reports subjective symptoms, mood changes, and<br />

cognitive performance observed during a research project on the<br />

effects of heat exposure, atropine/2-PAM administration, and<br />

wearing of both the BDU and MOPP-IV ensembles. The overall<br />

project consisted of two separate studies which were identical<br />

except that the BDU ensemble was worn in one of the studies, and<br />

the MOPP-IV ensemble was worn in the other study.<br />

Study 1. Effects of Atropine/2-PAM and Heat on Symptomatic, Mood,<br />

and Cognitive Reactions While Wearing the BDU Ensemble<br />

Method<br />

Fifteen male soldiers, ages 18-32 years, were screened<br />

medically and were tested for normal vision and hearing. They<br />

were trained intensively 6 hours dally for 5 consecutive days on<br />

a battery of performance tasks and then performed the task8 on 4<br />

separate test days, each day corresponding to one of the<br />

following experimental test conditions: (a) control (saline<br />

placebo, 70°F 121.1°C1 30% RH); (b) drug only (2 mg atPOPine,<br />

600 mg 2-PAM, 70°F E21:1°C3<br />

(saline placebo, 95'F [3!j"cj,<br />

30% RH); (c) ambient heat only<br />

60% RH); and (dl drug and ambient<br />

. .


-- ---. ----- --- --.._ _ .I-. I<br />

heat (2 mg atropine, 600 mg 2-PAM, 95OF c35°C1. 60% RH). On each<br />

test day, the soldiers received either atropine/2-PAM or<br />

equivalent volumes of saline placebo, injected into the thigh<br />

muscle by 22-gauge syringes. Drug conditions were double-blind;<br />

however, the study medical monitor knew the identities of both<br />

drug and placebo participants. Test days were separated by at<br />

least three days for recovery from the preceding drug conditions.<br />

Daily testing began 30 min after drug administration.<br />

Participants attempted to complete three cycles of the<br />

performance tasks each testing day, and performed until either<br />

they withdrew voluntarily or were removed by the medical monitor-.<br />

Cycles began at standard 2-hr intervals to maintain uniformity of<br />

daily heat exposure. Participants were allowed to drink water ad<br />

lib from standard military canteens; lunch and snacks were<br />

omitted.<br />

Three subjective tests were administered periodically during<br />

each experimental session: (a) the US Army Research Institute of<br />

Environmental Medicine Environmental Symptoms Questionnaire (ESQ;<br />

Kobrick a( Sampson, 1979), as modified by Kobrick, Johnson, and<br />

McMenemy (19881 ; (bl the Profile of Mood States (POMS; McNair,<br />

Lorr, & Droppelman, 19811; and (cl the Brief Subjective Rating<br />

Scale (BSRS; Johnson, 19811. The ESQ is a self-rating inventory<br />

for sampling subjective reactions and medical symptomatology<br />

during exposure to environmental and other stressors. The'POMS<br />

is a rating scale of 65 items to assess 6 mood states (tension,<br />

depression, anger, vigor, fatigue, confusion). The BSRS<br />

appraises subjective feelings of warmth, discomfort, and<br />

tiredness on separate rating scales by selection of descriptive<br />

words or phrases. The ESQ and POMS were given once at the end of<br />

each daily session. The BSRS was given once at the beginning of<br />

each session (30 min post-injection) and once at the end of each<br />

cycle (150, 270 and 390 min post-injection).<br />

Participants performed the following cognitive tasks in each<br />

2-hour testing cycle: (11 verbal reasoning - judging the<br />

correctness of grammatical transformations (Baddeley Grammatical<br />

Reasoning Test, 1968); (2) simple reaction time - pressing a key<br />

to the onset of a signal 1,lght; (3) choice reaction time -<br />

pressing one of two keys to the onset of one of two signal<br />

lights; (4) digit -symbol substitution - substituting code symbols<br />

for their symbol counterparts (Digit Symbol Substitution Test,<br />

Wechsler, 18551; (51 speech intelligibility - correctly<br />

identifying spoken words among other similar words (Modified<br />

Rhyme Test, House, et al, 19651.<br />

Results<br />

The group mean ratings for each of the 68 ESQ items in each<br />

of the four test conditions showed the fewest severe symptoms in<br />

the control condition. The two heat conditions generated more<br />

symptoms, and different patterns of incidence related to heat.<br />

Atropine/2-PAM generated high ratings on symptoms usually<br />

attributed to those drugs (dry mouth, thirst). Drug/heat, the<br />

most severe test condition, generated the greatest number of high<br />

211<br />

_


atings. Headache and lightheadedness were reported only under<br />

drug/heat.<br />

Two-way (Temperature x Drug) analyses of variance (ANOVAs)<br />

on the POMS ratings showed significant main effects for both<br />

temperature and drug, acting to Increase tension (F(l,14) =<br />

5.36,E


On the POMS, two-way ANOVAs for repeated measures on each of<br />

the scares showed significant drug and temperature main effects<br />

and significant Drug x Temperature interactions, indicating that<br />

the drug led to feelings of tension (F(1,7) = 7.06, ~


performed, to elicit early reactions prior to withdrawl. The ESQ<br />

and POMS could not be analyzed in this manner because they were<br />

.given only once at the end of each test day. Significant<br />

temperature main effects were found for warmth (F(1,7) = 37.19.<br />

B


Journal of Clinical Pharmacology, 2, 367-368.<br />

House, A. S., Williams, C. E., Hecker, M. H. L., & Kryter,<br />

K. (19651. Articulation testing methods: Consonantal<br />

differentiation with a closed response set. Journal of<br />

the Acoustical Society of America, 37, 158-166.<br />

Johnson, R. F. (19811. The effects of elevated ambient<br />

temperature and humidity on mental and psychomotor<br />

performance. In Handbook of the Thirteenth Commonwealth<br />

Defense Conference on Operational Clothing and Combat<br />

Equipment (pp. 152-1533). Kuala Lumpur, Malaysia:<br />

Government of Malaysia.<br />

Kobrick, J. L., Johnson, R. F., & McMenemy, D. J. (1988).<br />

Nerve agent antidotes and heat exposure: Summary of<br />

effects on task performance of soldiers wearing BDU and<br />

MOPP-IV clothing sy‘stems (Technical Report Tl-89).<br />

Natick, MA: US Army Research Institute of Environmental<br />

Medicine. (DTIC Accession No. A 206-2221<br />

Kobrick, J. L., Johnson, R. F., & McMenemy, D. J. (1990).<br />

Effects of nerve agent ,antidote and heat exposure on<br />

soldier performance in the BDU and MOPP-IV ensembles.<br />

<strong>Military</strong> Medicine, 155, 159-162.<br />

Kobrick, J. L., & Sampson, J. B. (1979). New inventory for<br />

the assessment of symptom occurrence and severity at<br />

high altitude. Aviation Space and Environmental<br />

Medicine, 50, 925-929.<br />

Kolka, M. A., Stephenson, L. A., Bruttig, S. P., Cadarette,<br />

43. s., & Gonzalez, R. R. (19871. Human thermoregulation<br />

after atropine and/or pralidoxime administration.<br />

Aviation S ) 58, 545-549.<br />

Marzulli, F. N, & Cope, 0. P. (19501. Subjective and<br />

objective study of healthy males injected<br />

intramuscularly with 1, 2 and 3 mg atropine sulfate<br />

(Medical Division Research Report No. 241. Aberdeen,<br />

MD: US Chemical Corps, Army Chemical Center.<br />

McNair, D. M., Lorr, M., & Droppelman, L. F. (19821. EITS<br />

manual for the Profile of Mood States. San Diego, CA:<br />

Education and Industrial <strong>Testing</strong> Service.<br />

Moylan-Jones, R. J. (19691. The effect of a large dose of<br />

atropine upon the performance of routine tasks. British<br />

Journal of Phsrmacoloffy, 37, 301-305.<br />

Penetar, D. M., & Henningfield, J. E. (1986). Psychoactivity<br />

of atropine in normal volunteers. Pharmacology and<br />

Biochemistry of Behavior, 24, 1111-1113.<br />

Taylor, P. (1980). Anticholinesterase agents. In A. G.<br />

Gllman, L. S. Goodman, & A. Gilman (Eda.), The<br />

pharmacological basis of therapeutics (6th ed., pp.lOO-<br />

,119). New York: Macmillan.<br />

Wetherell, A. (1980). Some effects of atropine on short-term<br />

memory. British Journal of Clinical Pharmacology, 10,<br />

627-628.<br />

215


GUTS : A BELGlAN GUNXER ?'ESTl?;G SYSTF2-I<br />

F. LESCREVE<br />

W. SLOWACK<br />

CRS - Belgian Armed Forces CRS - Belgian Armed Forces<br />

Brussels<br />

Brussels<br />

1. Introduction<br />

To fclfil the need of expert-selection for gunners, the Belgian Army<br />

has developed a selection-simulator. First a job analysis of different<br />

weapon systems was completed. This was the base for the construction of<br />

GUTS. Different physical and psychological stressors are important.<br />

2. Theoretical Background<br />

GUTS is constructed from a holistic point of view. We chose to put<br />

the candidate gunners in a complete, real life-like situation instead of<br />

confronting them with different subtasks from the gunners job, one at<br />

the time.<br />

3. Job Analysis<br />

Following weapon systems were carefully observed to extract the<br />

crucial taskcomponents':<br />

- Leopard-tank<br />

- CVRT (Combat Vehicle Reconaissance Trackted)<br />

- GEPARD (Anti-Aircraft tank)<br />

- HAWK (Anti-Aircraft Missile)<br />

- JPK (Jacht Panzer Kanone)<br />

- AIFV (Armored Infantry Fighting Vehicle;<br />

- MILAN (Missile Leger Antichar ><br />

a. Tasks<br />

The following tasks were common to practically all the weapon<br />

systems.<br />

1. Knowledge of procedures. _-<br />

2. Ranging and target recognition.<br />

3. Target engagement and acquisition.<br />

4. Target identification.<br />

5. Choice of ammunition.<br />

6. Loading of ammunition.<br />

7. Tracking and firing.<br />

The working space of the gunner was measured. For the construction<br />

of the simulator we took the average of the measures of the different<br />

weapon systems.<br />

216


. Stressors<br />

An anal.ysis was made of the possible physical and psychological<br />

stressors.<br />

1. Physical Stressors<br />

1. Limited working space.<br />

2. Iieat caused by instruments, engine, clothing.<br />

3. Vibration due to the vehicle movements.<br />

4. Noise, especially in a war situation.<br />

5. Darkness.<br />

2. Psychological Stressors<br />

1. Overload of information, visual and auditory.<br />

2. Permanent concentration needed.<br />

3. Time presscre.<br />

4. Unexpected events.<br />

5. Feeling of isolation;,<br />

6. Feeling of claustrofobia<br />

C. Ability and Aptitude Requirements<br />

Based on the tasks analysis and the inventory of the different<br />

stressors, it is clear that several ability and aptitude requirements<br />

the<br />

are needed for beeing a good gunner. As we shall see later,<br />

different requirements are also needed for a good performance in GUTS.<br />

Requirements :<br />

- Learning ability.<br />

- Memory.<br />

- Reaction time.<br />

- Motor coordination.<br />

- Stress management.<br />

- Concentration.<br />

4. Construction of GUTS<br />

The aim was to incorporate the different tasks and stressors in the<br />

selection-simulator. Ke did this in the following design.<br />

16<br />

I


a. The Cabine<br />

For the size of the working space we used the average of the<br />

measurements of the different weapon systems. Following sizes were taken<br />

into consideration; the depth of the working space, the space for the<br />

head movements, size of the seat, distance between look-hole and the<br />

handle, distance between head and top of the cabine, space for the legs<br />

and distance between seat and top of the cabine.<br />

b. The Instruments<br />

The instruments in the simulator are life-like copies of real<br />

instruments. We discuss here only the most important ones, with are<br />

indicated at the figure. .<br />

1. Lookhole : ynrough the lookhole you can see the screen of the<br />

computer on with you see a landscape with the<br />

different: targets. You also see the circle for the<br />

engagement of the targets and the reticle for the<br />

tracing of the targets.<br />

2. Identification box :This box is used for identifying the targets<br />

: 6 possibilities.<br />

3. Control box : With this box you start the whole procedure.<br />

4. Radio box : Here are the headphones for the candidate connected.<br />

In this way the candidate recieves his weapon<br />

control orders. There are also a lot of disturbing<br />

sounds and non important conversations on the radio.<br />

5. Ammunition box : To choose the type of ammunition depending on<br />

distance and sort of the target : 4 possibilities.<br />

6. Heating device : By means of a thermostat the temperature in the<br />

simulator raises to 30” C.<br />

11. Loudspeakers : The loudspeakers at the bottom of the cabine<br />

produce war-sounds. By making low frequency sounds<br />

they produce a disturbing vibration.<br />

15. Handle : The candidate must use the handle in order to engage<br />

a target with the circle he has on his screen,<br />

tracking a target by means of the reticle and firing<br />

with the fire buttons. The handle can move in all<br />

directions.<br />

__<br />

The candidate -gunner has to wear a gasmask. This is connected by a<br />

tube to a valve. Every five minutes the air supply is cut off for five<br />

seconds.<br />

5. The Testsession<br />

a. Learning the Testinstructions<br />

The goal of the testsession is explained to the candidate. He gets a<br />

description of all the instruments. He has to learn the different- kind<br />

Of tanks, the ammunition, the identification procedures and the<br />

engagement procedures. Special attention is payed to the weapon control<br />

orders (WCO).<br />

218


. The Test<br />

After a short demonstration of the instruments of the simulator, the<br />

candidate' puts on a gasmask and a battle dress and climbs in the cabine.<br />

The actual test consist of 3 identical cycles. Every cycle has 4<br />

periods, one period for each WCO. The test takes 30 minutes.<br />

C. The Engagement-firing Cycle<br />

To eliminate a target, a candidate must follow a strict procedure.<br />

1. Engagement of the target : steering the circle on the target<br />

and pushing the engagement buttons on the handle.<br />

2. Selection of ammunition, depending upon the kind of target and<br />

the distance of the target.<br />

3. Loading the ammunition<br />

I. Confirmation of the ammunition<br />

5. Aiming by putting the reticle on the target.<br />

6. Firing, by pressing-the firing buttons on the stearing gear.<br />

The engagement-firing cycle must be repeated for each target. The<br />

computer (APPLE MAC II) registrates all the actions of the candidate.<br />

d. Discussion<br />

A tes.tsession in GUTS is a heavy experience. This is caused by the<br />

physical and the psychological stressors. The candidates come out<br />

sweating.<br />

6. Results<br />

The candidates are scored in following categories :<br />

a. Time between appearance of a target and firing at a target.<br />

b. Decision errors : engaging the wrong target.<br />

c. Manipulation errors : for example choosing the wrong ammuilition.<br />

d. Procedure errors : errors made in the engagement-firing cycle.<br />

e. Numbers of hits.<br />

7. Validation of GUTS<br />

In 1991 a study will be carried out concerning the reliability and<br />

validity of GUTS.<br />

219<br />

.


CHAKAC'l'EKIZlNCi Kfi:jPON!~ELj TO fi’l’HE!;S UT-LLL% ING DOSP: EQUlVALENCY ME’L’1i01~0LOC,Y<br />

Robert S. Kennedy, Essex Corporation: William P. Dunlap, Tulane University:<br />

Janet J. Turnage, University of Central Florida:<br />

Jennifer E. Fowlkes, Essex Corporation*, Orlando, FL<br />

1NTRODUCTlON<br />

One of the chief problems in quantifying the effects of stressors on<br />

operational performance, such as heat and combat stress, is the lack of<br />

reliability in the criterion tasks. To circumvent the problems which hinder<br />

development of a quantitative definition of workforce performance decrement,<br />

we offer two methodologies: surrogate measurements and dose equivalency<br />

testing.<br />

Surroqate Measurement<br />

Insufficient attention to reliability can lead to attenuated validities,<br />

reduction of statistical power, higher sample size requirements, increased<br />

cost of experiments, and when hazard or discomfort is involved, human use<br />

problems. These problems focused us on development of highly reliable measure<br />

sets such as may be obtained with microcomputer-based mental acuity tests<br />

(Kennedy, Baltzley, Lane, Wilkes, & Smith, 1989; Kennedy, Wilkes, Lane, &<br />

Homick, 1985). We recognized these are separate from the operational<br />

criteria, but highly similar to the criteria in skill requirements. We<br />

reasoned that, if the measures correlate well with the criteria and behave<br />

similarly under changing task conditions, perhaps they could be used as a<br />

surrogate in place of the criteria. We called this approach “surrogate<br />

measurement” ( Lane, Kennedy, & Jones, 1986) and listed requirements for<br />

surrogate tests as those which are related to or predictive of real-world<br />

performances but are not actual measures of the performance per se. In our<br />

plan, surrogate measures are composed of tests or batteries that exhibit five<br />

characteristics:<br />

1. Stable so that the “what is being measured” is constant:<br />

2. Correlated with the operational performance;<br />

3. Sensitive to the same factors that would affect performance as the<br />

performance variable would:<br />

4. More reliable than field measures; and<br />

5. Involving minimal training time.<br />

An obvious candidate for a surrogate to measure military performance would<br />

be the Armed Services Vocational Aptitude Battery (ASVAB). ASVAB scores are<br />

used to determine eligibility for various military occupational specialties<br />

based on construct validity and continuing programs of empirical studies. The<br />

tests of the ASVAB also have considerable content and criterion-related<br />

validity, including training performances at military formal schools as well<br />

as operational performance studies. In at least one case (Wallace, 19821,<br />

performances during war games with tank forces were correlated with subtest<br />

scores from the ASVAB better than with any other variable in the study. But<br />

the ASVAH is not meant to be administered repeatedly. If it could be shown<br />

that the ASVAB was highly correlated with a repeated measurement test battery.<br />

-<br />

*Dr. Fowlkes is now employed at Engineering and Economics Research, Orlando, FL<br />

. m-,-i-. ..- _.... _ _,,<br />

220<br />

f


then by the principle of transitivity (things equal to the same thing are<br />

equal to each other), one might link changes in the surrogate with changes in<br />

the operational performance about which one wished to make statements.<br />

Dose Equivalency<br />

A second methodology that can be employed in studies of real-world<br />

performance is called “dose equivalency.” Dose Equivalency is a strategy used<br />

in conjunction with surrogate measures in order to quantify degradation of<br />

operational performance by selecting an indexing agent(s) or treatment and a<br />

set of target performance tasks. Then graded “dosages” of the indexing agent<br />

are administered and performance decrements as a function of the indexing<br />

agent are marked or calibrated against the various dosages. One is left with<br />

a functional relationship between an agent and performance(s).<br />

This strategy has been applied in a study we conducted using different<br />

dosages of orally-administe,red alcohol (Kennedy, Baltzley, Lane, Wilkes, &<br />

Smith, 1989). Alcohol was selected as the indexing agent for several<br />

reasons : 1) alcohol is known to be a global depressant, having wide-ranging<br />

and well-documented impacts on performance and operational readiness has been<br />

directed to the identification and calibration of what are to be considered<br />

“safe” and “unsafe” doses of this agent, 3) equipment and assay procedures are<br />

readily available for calibrating both blood alcohol levels (BALI and alcohol<br />

detected in expired breath (breathalyzer) and 4) because alcohol is widely<br />

used, it is feasible to administer to male subjects who, by self-report, use<br />

alcohol to a moderate degree, thereby obviating potential threat to volunteers<br />

and meeting requirements for ethical treatment of subjects in human<br />

experimental research.<br />

EXPERIMENT 1 - APTS AS SURROGATE FOR ASVAB SUBTEST<br />

In this analysis, 16 women and 11 men ranging in age from 18 to 38 were<br />

tested with a synthetic Armed Services Vocational Aptitude Battery (ASVAB)<br />

(Steinberg, 1986), and the microcomputer-based assessment used was the<br />

Automated Performance Test System (APTS), which is fully described in Kennedy<br />

et al. (1989). Seven of the tests used were from the original APTS (Pattern<br />

Comparison: Two-Finger and Nonpreferred Hand Tapping: Code Substitution:<br />

Simple Reaction Time: Grammatical Reasoning: and Manikin) and four additional<br />

subtests were selected from the Unified Tri-Service Committee Performance<br />

Assessment Battery (UTC-PAB) (Englund, Reeves, Shingledecker, Thorne, Wilson,<br />

h Hegge, 1987).<br />

The most dramatic findings were the consistently high reliabilities of the<br />

battery subtests; the smallest reliability was 0.85, which in our judgment is<br />

sufficient for statistical power and differential purposes.<br />

Scores on the performance battery were averaged across the seven trials<br />

and then correlated with the subscales and total score from the ASVAB.<br />

Multiple regression was used to examine the predictive power of the battery as<br />

a whole on the total ASVAB criterion. The multiple R was 0.94 (R2 = 0.881,<br />

and, even when corrected for shrinkage, the multiple H was 0.88. This<br />

indicates that when shrinkage owing to the particular sample used is taken<br />

into account, 7’1% of the ASVAB variance is explained by performance on the<br />

battery subtests.<br />

221<br />

.


A second mult ip1.e regression ana1ysi.s was conducted including the<br />

candidates’ surrogate performance subtests - those that would be used in the<br />

second study - Code Substitution, Pattern Compar.ison, Grammatical Reasoning,<br />

Manikin, Math PrOCeSSirlg , TWO Hand Tapping, Non-Preferred Hand Tapping, and<br />

Reaction Time. The multiple R was -92, indicating that we lost very little<br />

common variance with the ASVAB by using the shortened surrogate battery.<br />

Method<br />

EXPEK~MENT ~-INDEXING AGENT (ALCOHOL) ADMINISTEKED<br />

TO EXPERIMENTAL SAMPLE*<br />

Subjects. Male students, ranging in age from 21 to 42, were recruited as<br />

subjects. Acceptable candidates were those indicating some, but not<br />

excessive , experience with alcohol, no past history of chronic dependency of<br />

any types good general health, and indications of low risk for future<br />

alcohol-based problems. Students indicating problem family histories of<br />

chemical abuse/dependency and/or past personal histories of chemical<br />

abuse/dependency were advised not to participate. Various paper-and-pencil<br />

and computer software materials were employed in screening and assessing the<br />

individual subjects and are discussed in detail elsewhere along with &he<br />

criteria employed in Kennedy, Baltzley, Lane, Wilkes, and Smith (1989).<br />

Microcomputer testing was conducted with eight identified NEC PC8201A<br />

microprocessors, and the Intoximeter Model 3000 breath analyzer was used to<br />

estimate alcohol concentrations in the blood.<br />

Procedure<br />

Alcohol was consumed in a group setting with subjects completing the<br />

drinking between several minutes to slightly more than one hour. Each subject<br />

was brought to 0.15 BAC and monitored as the descending limb of the BAL curve<br />

was achieved.<br />

Upon completing data collection, subjects were returned to supervised<br />

housing where they were required to stay for the remainder of the evening and<br />

abstain from further consumption of alcohol. Upon wakening the following<br />

morning, subjects self-administered one battery of the APTS. This “hangover”<br />

measure was completed within one hour of wakening and all measures were<br />

finalized by 9:30 A.M. The hangover measure typically occurred within 13 to<br />

17 hours of the pre-alcohol APTS measure taken the previous day.<br />

Results<br />

The means for each of the APTS performance measures were monotonicallY<br />

related to blood alcohol levels and all were significant (p < -001). Figure 1<br />

*depicts the performance measures for a sample subtest (Code Substitution) in<br />

*Many oE the technical details regarding methods, procedures, and safeguards<br />

in studying the effects of orally administered alcohol on APTS performance<br />

wet-e worked out in previous research and are described extensively elsewhere<br />

(Kennedy. Wilkes, and Rugotzke, 1989).<br />

222<br />

i


the order they were obtained. FoLLowing the alcohoL challenge, perEormance<br />

dropped dramatically on all subtests, then recovered, in most cases in a<br />

monotonic or near monotonic Eunction, as determined by BAL during the period<br />

of alcohol metabolism. If one were to choose a single subtest to index BAL,<br />

Code Substitution would be a likely candidate. For this test it can be seen<br />

that each change of one hundredth of a percent BAL is indexed by a change of<br />

approximately 1.5 points on the code Substitution task.<br />

Figure 1. Code Substitution Number Correct for<br />

Baseline and Blood Alcohol Levels As Shown<br />

Formulation of the Quantitative Dose Equivalency Model<br />

Multiple regression was used to develop scores that maximally predicted<br />

BAL. The multiple regression between BAL as predicted from all nine surrogate<br />

battery subtests was 0.77. Subsequent stepwise regression analysis revealed<br />

that an optimally selected subset of only four oE the subtests produced a<br />

multiple R of 0.765; therefore, virtually no loss in predictive power resulted<br />

from use of the shortened battery. When this latter coefficient is corrected<br />

for shrinkage, R equalled 0.75; therefore, 57% of the variance in blood<br />

alcohol is predictable from the four subtest battery. The resulting<br />

regression equation (simplified by rounding to whole numbers) is shown in<br />

Equation (1):<br />

BAL = 0.3 - (9CS+2GR+SMP+6TFT)/lOOO, (1)<br />

where CS, GR, Mp, and TFT, refer to percent decrement from baseline of Code<br />

Substitution, Grammatical Reasoning, Math Processing, and Two-Finger Tapping,<br />

respectively.<br />

223


- --.- -. -..- .._.._<br />

To further demonstrate how the Eour test surrogate battery surfaced by the<br />

above research can serve as a bridge between alcohol (the indexing agent) and<br />

military performance readiness (the synthetic ASVAB) we computed one further<br />

regression equation from the synthetic ASVAB data described above. Equation<br />

(2) predicts the ASVAB (scaled wit.h mean = 100 and SD = 15) from Code<br />

substitution, Grammatical Reasoning, and Math Processing. Two Finger Tapping<br />

was not used as its beta weight in the equation was quite low. The equa t ion<br />

is:<br />

ASVAB = .92CS + .42MP + .15GR + 26 (2)<br />

where CS, MP, and GR are raw scores for Code Substitution, Math Processing,<br />

and Grammatical Reasoning, respectively. Using this equation to fit the data<br />

from the alcohol study, we can represent the perEormance decrements produced<br />

by the various BAL levels relative to a metric based on a standardized ASVAB<br />

as follows. These relationships are shown in Table 1.<br />

Table 1. Predicted Standardized ASVAB Means and Standard Deviations<br />

from Surrogate Battery Performance as a Function of Blood Alcohol Level<br />

BAL Mean ASVAB SD<br />

Baseline 103.6 12.3<br />

0.050 104.7 13.4<br />

0.075 101.8 12.7<br />

0.100 96.5 13.9<br />

0.125 90.3 15.9<br />

0.150 85.9 13.4<br />

CONCLUSION<br />

The objective of the effort reported herein was to provide a quantitative<br />

methodology to permit assessment of performance degradation in humans which<br />

may result Erom exposure to toxic or stressor agents encountered on the<br />

battleEield. The scientific literature has shown that performance on the<br />

ASVAB is correlated with military job performance, and tests from a<br />

microcomputer test battery have been developed which are sensitive to gases<br />

like halon and to various toxic agents. Using these relations, the present<br />

analyses were performed, the results of which are:<br />

0 Performances on APTS subtests are correlated with subtests of a synthetic<br />

ASVAB.<br />

0 Increasing dosages of alcohol result in monotonically greater performance<br />

decrements on APTS subtests.<br />

0 The performance decrements can be indexed to percent blood alcohol via a<br />

linear regression equation.<br />

0 Performance decrement on APTS can be indexed to performance decrement On<br />

aptitude tests via a linear regression equation.<br />

0 Performance equivalency and dose equivalency relationships were<br />

successfully demonstrated so that:<br />

224


a regression equation can be created which translates reductions in<br />

APTS performance due to any treatment (such as an irritant gas or<br />

psychological stress) into ASVAB equivalent PerfOtmanCeS, and<br />

a regression equation can be created which translates reductions in<br />

performance due to such agents or treatments into units 0E percent<br />

blood alcohol.<br />

ACKNOWLEDGMENTS<br />

Support for this research was from the U.S. Army' Medical Research<br />

Acquisition Activity under Contract DAMD17-89-.C-9135. The views, opinions,<br />

and/or findings contained in this report are those of the authors and should<br />

not be construed as an official Department of the Army position, policy or<br />

decision unless so designated by other documentation. The authors . are<br />

indebted to Gene G. Rugotzke for conducting the blood alcohol analysis and to<br />

Robert L. Wilkes for collection of APTS data.<br />

.p<br />

REFERENCES<br />

Englund, C. E., Reeves, D. L., Shingledecker, C. A., Thorne, D. R., Wilson,<br />

K. P., & Hegge, F. W. (1987). Unified Tri-Service Coqnitive Performance<br />

Assessment Battery (UTC-PAB): I. Desiqn and specification of the battery,<br />

Report No. 87-10. San Diego, CA: Naval Health Research Center.<br />

Kennedy, R. s., Baltzley, D. R., Lane, N. E., Wilkes, R. L. & Smith, M. G.<br />

(1989). Development of microcomputer-based mental acuity tests: Indexing<br />

to alcohol dosaqe and subsidiary problems (Final Report, Grant No. ISI-<br />

8521282). Washington, DC: National Science Foundation.<br />

Kennedy, R. S., Wilkes, R. L., Lane, N. E. & Homick, J. L. (1985). Preliminary<br />

evaluation of a microbased repeated measures testinq system,<br />

Technical Report (EOTR 85-l). Orlando: Essex Corporation.<br />

Kennedy, R. S., Wilkes, R. L., & Rugotzke, G. G. (1989). cognitive performance<br />

deficit regressed on alcohol dosage. Proceedinqs of the 11th Tnternational<br />

Conference on Alcohol, Drugs, and Traffic Safety (p. C-27).<br />

Chicago, IL.<br />

Lane, N. E-, Kennedy, R. S. & Jones, M. B. (1986). Overcoming unreliability in<br />

operational measures: The use of surrogate systems. Proceedinqs of the<br />

30th Annual Meetinq of the Human Factors Society. Santa Monica, CA: Human<br />

Factors Society.<br />

NEC Home Electronics (USA). (1983). NEC PC-8201A Users Guide. Tokyo: Nippon<br />

Electric Co., Ltd.<br />

Steinberg, E. P. (1986). Practice for the Armed Services test. New York: Acco<br />

Publishing Co.<br />

Wallace, J. R. (1982). The Gideon criterion: The effects of selection criteria<br />

on soldier capabilities and battle results, Research Memorandum 82-l.<br />

Fort Sheridan: U.S. Army Recruiting Command, Research, Studies and<br />

Evaluation Division, Program Analysis and Evaluation Directorate. (NTIS<br />

No. AD Al27 975)<br />

225


Job Sets for Efficiency in Recruiting and Training (JSERT)’<br />

Jane M. Arabian and Amy C. Schwartz’<br />

U.S. Army Research Institute<br />

for the Behavioral and Social Sciences<br />

Alexandria, VA<br />

The Army is facing radical changes brought about by the reduction in the siz:.of its<br />

force. The challenges encountered by the Army will require different and more clllcicnl<br />

ways of going about the business of recruiting, selecting and classifying young nlcll :md<br />

women as they enter the service. Changes in enlisted end strength will have a dyn:tmic<br />

impact on, for example, MOS fill and training seat utilization. In the past, changes in<br />

authorizations have caused a manpower surplus or shortage in various MOS. The dcla~cd<br />

entry program (DEP) has not been able to provide enough flexibility to compens:ltc fog<br />

such near term authorization changes. Therefore, the Army has begun to evalu:ltc the<br />

potential for a “job sets” concept to improve manpower and personnel managemctll I)!(<br />

fostering more timely, accurate personnel classification.<br />

This paper will describe the rationale and tailoring of the JSERT concept to the<br />

particulars of the Army’s current manpower and personnel environment. The gcneml<br />

approach was to devise two parallel tracks: 1) the pragmatic identification of occup:l(ions<br />

(MOS) which would comprise a given “Job Set” and 2) an empirical research progr:lm i’or<br />

confirming the “Job Sets”, devising a means for selecting an appropriate classification tcs!<br />

battery, and developing a feedback/appraisal system for the JSERT concept.<br />

Currently, in the vast majority of cases each recruit receives a contract for a spccii’ic<br />

occupation, such as M-l turret mechanic. This contract is a legal commitment by the Army<br />

to provide the individual with the specific training for M-l turret mechanics. This mc:lns<br />

that if the Army finds that it doesn’t need as many M-l turret mechanics it had cstim:\l4<br />

or that it needs more Bradley turret mechanics than expected, contracts must he rcnegotiated<br />

and the individuals involved may decide not to enlist. This may be costly, both<br />

in terms of dollars and loss of desirable individuals for service.<br />

The Army has been able to accommodate small discrepancies in its estimates for<br />

personnel by tapping into the pool of recruits in the Delayed Entry Program (DEl’).<br />

However, this does not always provide a satisfactory solution; individuals’ contracls still<br />

need to be honored. Given the anticipated changes in the size of the force and its<br />

composition, it is expected that it will become even more difficult to estimate accurately<br />

’ Paper presented at the 32nd Annual Conference of the <strong>Military</strong> <strong>Testing</strong><br />

<strong>Association</strong>, November 1990, Orange Beach, ALA.<br />

’ All statements expressed in this paper are those of the authors and do not<br />

necessarily express the official opinions or policies of the U.S. Army Research Institute<br />

or the Department of the Army.<br />

226


--__-. -. _<br />

the Army’s near-term personnel needs and to make up for the differences with the DEP.<br />

More flexibility in manpower management and personnel assignment is needed. The<br />

development of “Job Sets”, as described below, would give the Army such flexibility.<br />

Grouping Jobs<br />

Many MOS have the same or very similar entry requirements (i.e., Armed Services<br />

Vocational Aptitude Battery (ASVAB) Aptitude Area [AA] composite score cut-offs and<br />

physical [e.g., vision] profiles). This is especially true of MOS in the same CMF or Career<br />

Management Field (such as Mechanical Maintenance). It would therefore seem possible<br />

to group such MOS into “sets” for recruiting and enlistment purposes. The Army would<br />

then be able to enlist soldiers as turret mechanics, for example, without specifying, at the<br />

time of enlistment, which type of turret mechanic training they would receive. This would<br />

give the Army just that much more manpower management flexibility. Much closer to the<br />

point of actually filling training seats, the Army would be able to determine which<br />

individual would receive which specific course of training.<br />

As with any change to an established system, implementation of this JSERT concept<br />

will cause disruptions and periods of awkward adjustment. However, the concept does fit<br />

well with other current Army cost-saving initiatives (e.g., consolidating MOS and reducing<br />

the number of reception battalions) and appears to offer important benefits. This is not<br />

to trivialize the adjustments that will need to be made by, for the example, the recruiting<br />

and especially the training communities. Therefore, several steps have been taken to<br />

minimize the potential down-side of JSERT-related changes. These measures are described<br />

below.<br />

Identification and Coordination With Key Players<br />

Working closely with the Army’s Manpower and Personnel Management/Enlisted<br />

Accessions Policy office, key agencies and functions that would be affected by JSERT were<br />

identified. A “strawman” concept was circulated, briefed and discussed with each key player<br />

over a four-month period. The concept was refined and modifications were made based<br />

on the inputs received.. One key refinement was the addition of parallel tracks, one for<br />

testbed implementation and one for research and development. The tracking will be<br />

described shortly.<br />

In addition to exploring the concept with Army personnel, the Air Force classification<br />

system was also examined. While many of the manpower and personnel issues faced by the<br />

two services are quite different, the JSERT concept is not drastically different from the<br />

Air Force’s current classification system.<br />

After these information gathering and coordination efforts, a key players research and<br />

planning meeting was held. This provided the opportunity for further explication of the<br />

JSERT concept, exchange of concerns, identification of roles and responsibilities, selection<br />

of candidate job sets for a testbed, and the joint determination of milestones for the<br />

implementation of the testbed.<br />

227


Parallel Tracks<br />

-...----<br />

Given the general desire for a swift remedy to the manpower management difficulties,<br />

the development and execution of a comprehensive R & D program to address the issues<br />

raised by changing the Army’s recruiting, enlisting, and training systems was simply not<br />

feasible. Therefore, two tracks have been devised for the JSERT concept.<br />

Track One. The key feature of this track is that it is driven by practical considcrtltiorls.<br />

In order to put the JSERT concept into practice as quickly as possible, jobs can be formed<br />

into sets based on “face validity”. A primary concern is to minimize disruption to the CNF<br />

structure, and to take into account logistic, training and cost considerations. Therefvre, at _<br />

least the initial job groupings would involve MOS that currently use the sanie aptitude are:\<br />

composites and cut scores, have the same proponents and are trained in the same loution.<br />

The candidate MOS identified at the key players meeting represent a key milestone of<br />

the Track One effort. The candidate MOS have been circulated among the appropriate<br />

proponents for review and comment. Their input will be used to make the final<br />

determination of job sets for the testbed implementation.<br />

Track Two. While Track One is getting under way, the Track Two efforts have begun.<br />

Track Two efforts form an empirical, applied research program characterized by three<br />

primary features: 1) Within job set validity confirmation, 2) Classification battery<br />

selection, and 3) System appraisal/feedback.<br />

The job set validity confirmation consists of developing analytic tools or models to<br />

determine the fit of jobs with any given set. For example, attribute (skill/ability) or job task<br />

taxonomies can be used by individuals familiar with the jobs to provide “job profiles” ( e.g.,<br />

identification of the tasks making up a job and their importance or criticality). These<br />

profiles can then be compared across jobs and a judgement made as to the acceptability of<br />

the similarities or dissimilarities. If the profiles of the jobs appear too dissimilar or if one<br />

job stands out as too different from the other jobs then there would be a basis for<br />

eliminating some job(s) from the set.<br />

The job descriptions or profiles described above can also be used to help the Army<br />

identify additional classification tests. The elements of the profiles can be linked to tests<br />

of individual abilities (i.e., predictor tests). The tests may then be used to help place (or<br />

classify) individuals into jobs where they are most likely to perform well.<br />

In fact, as part of Project A, the Army’s comprehensive project to improve the selection<br />

and classification system, a new battery of predictor tests was developed. So now, in<br />

addition to ASVAB which measures primarily cognitive ability, the Army has the<br />

opportunity to assess an individual’s spatial and psychomotor abilities, temperament and<br />

vocational interests. The additional information provided by these tests can help the Arm><br />

make better use of its human resources by improving the match between a soldier’s abili tics<br />

and the job’s requirements.<br />

For this part of the JSERT program, research will be conducted to develop a<br />

methodology for building tailored classification batteries. These special batteries would<br />

be used for assigning soldiers within a job set to a particular job. The first requirement,<br />

228


however, would be to determine whether or not special classification testing is warranted.<br />

Since job performance can frequently be improved by selecting individuals with particular<br />

skills and/or by training particular skills, the trade-offs implied by testing for selection<br />

versus training would have to considered.<br />

Two research efforts are currently planned to develop the tailored classification<br />

batteries. The first effort will expand upon the Cognitive Requirements Model (CRM)<br />

developed by Hay Systems, Inc. to include spatial and psychomotor elements. This new<br />

model, CRM+, will employ decision flow diagrams to guide job experts through the<br />

elements of the model. Attributes identified in this manner will be compared across jobs<br />

to determine the similarity and differences in job requirements. The same information will<br />

also be used to identify classification tests most likely to improve the person-job match<br />

within the job set.<br />

The second effort to develop tailored test batteries will build upon the research<br />

conducted in the Army’s Synthetic Validation Project (SynVal) by the American Institutes<br />

for Research with Personnel Decisions Research Inc. and the Human Resources Research<br />

Organization. Subject matter experts will be asked either to identify directly the importance<br />

and level of attributes for jobs or to identify job tasks from the Army Task Taxonomy.<br />

Either visual inspection of the resulting profiles or more formal clustering algorithms will<br />

then be used to compare the profiles. The profile elements may then be matched up with<br />

the predictor tests. The identified tasks can be linked, using the results of SynVal, to<br />

attributes and to the tests which measure those attributes (or the directly identified<br />

attributes may themselves be used to identify the appropriate predictor tests).<br />

An important consideration of both these efforts will be to develop valid procedures<br />

that are credible and can be employed by non-scientists. Indeed, it is expected that any<br />

additional testing that may be adopted by the Army will be administered, scored and used<br />

in the assignment decision-making process by Army personnel during basic training, prior<br />

to the start of Advanced Individual Training. Therefore, the procedures must be<br />

straightforward, non-technical, and cost-efficient. A demonstrable value for administering<br />

any additional tests (i.e, improved job performance, reduced training time, lower attrition)<br />

to off-set the costs and inconvenience of specialized testing must be clearly evidenced.<br />

The third JSERT research focus is on the development of an appraisal feedback system<br />

for the JSERT “system” itself. The goal of the feedback system is to monitor the<br />

performance of JSERT, not the job performance of individuals per se. Although ratings<br />

of performance or training needs may be solicited from supervisors and individual soldiers,<br />

the ratings would be used for research purposes or for operational changes to the JSERT<br />

system, not to affect the careers of the rated individuals. The concern is to set up a<br />

monitoring system so that if jobs change over time or there is a shift in the overall abilities<br />

of soldiers being enlisted, the Army would have some consistent means of evaluating the<br />

change, documenting the impact on job performance and notifying the system that some<br />

action is needed.<br />

It may be, for example, that initial job analyses did not include some ability which,<br />

alihough not currently measured by the Army, turns out to be important for job<br />

performance. The Army may wish to specifically select individuals with this ability, but<br />

presently there is no mechanism in place that would even uncover the requirement for that<br />

229


ability, [The closest “system” the Army has for modifying the classification requirements is<br />

to notice there is some problem, such as high attrition, and then request technical advice<br />

from AR1 to identify the problem and suggest solutions.]<br />

The JSERT feedback system would provide a more formal, standardized mechanism.<br />

The feedback system must be proven to be scientifically valid and reliable for not only<br />

ensuring that the classification system is working satisfactorily, but also to provide a<br />

framework for intervention. Feedback results indicating a gap between soldier abilities, for<br />

example, and job requirements may indicate that a modification of the training curriculum<br />

is needed and/or that the classification test battery should be altered. The Army will have<br />

the results of the feedback, in addition to any cost-benefit analyses, upon which to base its<br />

correction strategy. Basically, the feedback system will create a means for getting<br />

information about how well the classification battery is working from the field (end-users)<br />

back to the classifiers. .,<br />

Conclusions<br />

While there will be disruptions to the recruiting, selection/classification and training<br />

systems, changes in roles and responsibilities (e.g., a shift in responsibility from recruiting<br />

to training commands for managing MOS fill), and modifications to computer programs<br />

(ATTRS, REQUEST), the potential benefits of the JSERT concept are considerable. The<br />

concept will: increase the opportunity for MOS consolidation and CMF restructuring,<br />

reduce MOS codes for recruiting, fit well with efforts to consolidate Basic Training sites,<br />

increase the potential for improved soldier-job matches (classification), and increase much<br />

needed manpower management flexibility.<br />

The project has the support of the Office of the Deputy Chief of Staff for Personnel<br />

and the selection of MOS for three potential testbeds (Infantry, Quartermaster, and<br />

Ordnance) is currently being finalized. Although a target start date for the testbed has<br />

been selected (July 1991), it is not clear how the testbed will proceed. The downsizing of<br />

the Army together with high recruiting levels means that approximately 30% of the FY91<br />

accession mission is already in the DEP with contracts for specific MOS training. It is<br />

conceivable that if this pattern keeps up, it will be very difficult to change over fairlv<br />

smoothly to the more general enlistment contracts needed to implement the JSERT<br />

concept. Nevertheless, the portions of the project that can proceed (the research elements)<br />

are underway.<br />

.


Development of a New Language Aptitude Eattery<br />

The Defense Language Institute Foreign Language Center (DLIFLC) is the<br />

proponent for the current Defense Language Aptitude Battery (DLAB) and is<br />

also the primary agency with the mission of providing language training for<br />

DOD military personnel. DLAB currently exists in one form. The range of<br />

correlations between DLAB and post-training measures of language<br />

proficiency across different language courses and skill modalities is from<br />

.25 to .55. DLIFLC is seeking to develop an improved aptitude test that<br />

would predict the degree to which a potential student will develop language<br />

proficiency in speaking, reading, and listening skills, and also determine<br />

the language or languages to which a potential student is best suited.<br />

This development effort builds upon an extensive database gathered on<br />

DLIPLC students in a a major ongoing project to identify predictors of<br />

success in language training and facto:cs associated with the presence,<br />

direction, and extent of language skill change after training.<br />

B-ground<br />

__.. - u<br />

At initial screening, candidates for language training must attain a<br />

minimum score on a specified composite of the Armed Services Vocational<br />

Aptitude Battery (ASVAB) in order to be elgibile to take the DLAB. There<br />

is some variation in the definition of required ASVAB composites across the<br />

Services, and certain variations in composite cut scores contingent on<br />

eventual training assignment.<br />

Approximately forty different language courses are taught either at<br />

the DLIFLC Monterey campus or through contract arrangements at other<br />

training locat ions. The length of basic foreign language courses varies<br />

from 24 to 47 weeks.<br />

DLI concentrates on general foreign language skill training with only<br />

relatively modest specialized training oriented toward specific job<br />

applications. After graduating from DLIFLC and prior to job assignment,<br />

military linguists typically receive advanced individual training (AIT)<br />

building on prerequisite basic language skills. Linguists perform a<br />

variety of sensitive jobs in signal intelligence, human intelligence, and<br />

in a liason capability with foreign governments and military forces.<br />

DLIFLC maintains significant contacts with other government and<br />

non-governemnt language training schools and universities. These contacts<br />

have been helpful to DLI in developing instructional systems and measures<br />

of training success that are relatively general in nature, while allowing<br />

more specialized training to benefit from the generally high positive<br />

transfer from basic language skills to more specialized training.<br />

Previous Research<br />

- - - - - - -~ -<br />

Since 1985, DLIFLC has actively participated in a joint research effort<br />

under the sponsorship of the U.S. Army Intelligence Center and School<br />

(USAICS), with support from the Army Research Institute for the Behaviorial<br />

and Social Sciences (ARI). This project known as the Language Skill Change<br />

Project (LSCP) investigated the following factors:<br />

1. Optimal predictors of success in language training available at<br />

initial screening prior to assignment of language training.<br />

2. Predictors of training success available during training.<br />

3. Variables related to change in language skills after DLI language<br />

training.<br />

The research design involved the collection of an extensive data base<br />

231


on 1903 Army linguists in four 1inguist military occupational special.ities<br />

(~0s) who had received language training in Spanish, German, Korean, and<br />

Russian at DLIFLC. Data collect ed at several points in the career cycle of<br />

these linguists included the following elements:<br />

1. ASVAB and DLAB scores at time of selection.<br />

2. Personality measures, interest inventories, and supplementary<br />

cognitive measures collected prior to the beginning of language training.<br />

3. Measures of the extent and nature of motivation to learn foreign<br />

languages collected prior to and during language training.<br />

4. Inventories of student learning strategies collected at two<br />

different times during their language courses.<br />

5. The Defense Language Proficiency Test (DLPT) , a series of measures<br />

of foreign language proficiency in listening, reading, and speaking skills<br />

collected immediately after graduation from language training, after<br />

subsequent AIT , and at subsequent annual administrations as mandated by<br />

Army regulations.<br />

DLIFLC and AR1 coordinated with the Office of Personnel Management<br />

(OPM) to obtain contractor support to build, collect, manage, and analyze<br />

the LSCP data base. In order to build upon information derived from the<br />

LSCP analyses, DLIFLC requested the contractor, Advanced Technologies<br />

Incorporated (ATI) to submit a plan for the design and development of a<br />

revised DOD language aptitude battery. The remainder of this paper draws<br />

heavily from that plan.<br />

Proposed e f_I_. fdevelopment o r t s<br />

The following conclusions drawn from the LSCP data analysis were<br />

relevant to the design of the new battery:<br />

1. Substantial prediction of success in language training, as measured<br />

by the DLPT, was afforded by factors not presently considered in language<br />

selection.<br />

2. The relationship of predictors to criterion performance differed<br />

across languages represented in the study, and within individual languages<br />

across the criterion skills of listening, reading, and speaking.<br />

Consequently, the AT1 management plan recommended two approaches for<br />

improving linguist selection and subsequent military linguist performance:<br />

1. to expand the range of factors considered in predicting success<br />

beyond those presently reflected in ASVAB composites and the current DLAB.<br />

2. to attempt to tailor predicton by language and language skill.<br />

From the very beginning, certain constraints on the development of a<br />

new language aptitude instrument were recognized.<br />

First of all, although the new aptitude test will attempt,\to provide a<br />

more exhaustive assessment of the potential military linguist s<br />

capabilities, the large-scale nature of its use, the time and resources<br />

that are likely to be available for its administration, and concerns about<br />

fatigue on the part of those taking the instrument necessitate that the<br />

time allotted for the the new test not greatly exceed that of the current<br />

DLAB. A possible mechanism for achieving maximal efficiency in measurement<br />

would be to use adaptive testing techniques; however, careful consideration<br />

would have to be given to the nature and interrelation of the traits<br />

underlying the abilities to be measured (as yet underspecified), and to the<br />

hardware requirements of such a system and their implications for test<br />

administration.<br />

Secondly, it would not be desirable to test different abilities or use<br />

different prediction and scoring formulae for every one of the forty<br />

languages taught in the Defense Foreign Language Program (DFLP). It would<br />

232


e preferable to group languages into a small number of categories sharing<br />

similar ability requirements and prediction characteristics.<br />

Strategy for Development Effort<br />

-_ -_.. .-. - - --- - -<br />

The sequence of activities proposed in designing the new aptitude<br />

battery is depicted graphically in Figure 1.<br />

- - - - -<br />

Define L<br />

MeasUremerit<br />

Options<br />

-..__.<br />

A<br />

ACTWlY 4<br />

ldentlfy Language<br />

Ability Requirements<br />

Figure 1<br />

Produce DUB II<br />

ACTIVITY 6<br />

-’ GroupLanguages<br />

By Ability<br />

Requhemnts<br />

The first activity listed is to define the components of the criterion<br />

to be predicted--foreign language proficiency. A first cut might be the<br />

the traditionally identified skill modalities of listening, seading,<br />

speaking, and writing; these traditional categories will need to further<br />

analyzed into much more specific task components.<br />

The next three activities planned are to identify how languages differ<br />

on proficiency dimensions, to identify abilities required to develop each<br />

type of proficiency, and to identify language differences in ability<br />

requirements. Note that the arrows in Figure 1 do not all point in a<br />

forward direction toward higher numbered activiites. As explained below,<br />

these activities are expected to be interactive and iterative processes and<br />

to involve the synthesis of several types of information into a final<br />

product based on consensus of project team members.<br />

The contractor and DLIFLC decided that the accomplishment of Activities<br />

1 through 6 would be best facilitated by an interdisciplinary approach<br />

involving a DLI expert in language proficiency testing, a cognitive<br />

psychologist specializing in the area of foreign language learning, and<br />

foreign language curriculum specialists with expertise in a wide variety of<br />

foreign languages. The intent was to combine insights from the traditional<br />

233<br />

.


perspective of linguistic analysis with a cognitively oriented analysis of<br />

the language learning process. The interdisciplinary team is expected to<br />

to develop a comprehensive list of abilities involved in learning foreign<br />

language S , including abilities that may be required in some languages but<br />

not all. On the one hand, this involves insuring that the definition of<br />

langugage proficiency is detailed enough that the cognitive abilities<br />

required to develop each category of proficiency can be specified. On the<br />

other hand, it involves reaching a consensus that the list of abilities is<br />

broadly applicable across foreign languages and across diverse training<br />

programs in these foreign languages.<br />

It is anticipated that the team would draft a preliminary taxonomy of<br />

abilities as a baseline list of skills which would be iteratively modified<br />

and improved as attempts were made to apply it to successive individual<br />

languages. At the same time the team would highlight any features of each<br />

successive language that experience had shown were relatively hard or<br />

relatively easy for native English speakers to learn. This process would<br />

serve both to validate the taxonomy and to identify differences between<br />

languages in cognitive abilities required.<br />

In an effort parallel to the validation of


Implementation of Content Validity Ratings<br />

in Air Force Promotion Test Construction<br />

Carlene M. Perry<br />

United States Air Force Academy<br />

John E. Williams<br />

Paul P. Stanley II<br />

USAF Occupational Measurement Squadron<br />

The USAFOMS has implemented a procedure in which subject-matter experts<br />

(SMEs) rate the content validity of individual test questions on Air Force<br />

promotion tests. This paper describes that procedure and assesses its impact<br />

upon test content and the perceptions of those involved in test,developm.ent.<br />

Specialty Knowledge Tests (SKTs) are loo-item multiple-choice tests which<br />

measure airmen's knowledge in their assigned Air Force specialties. Promotion<br />

to the ranks of staff sergeant (E-5) through master sergeant (E-7) is<br />

determined solely by relative ranking under the Weighted Airman Promotion<br />

System (WAPS). Each airman receives a single WAPS score, the sum of six com-<br />

ponent measures, with the SKT accounting for UP to 22% of the total. Since<br />

the other components generally yield little variance among individuals, the<br />

SKT is often the deciding factor in determining who gets promoted.<br />

SKTs are written by teams of senior NCOs with the assistance of USAFOMS psychologists.<br />

They are constructed using the content validity strategy of validation.<br />

The following components ensure a direct and close relationship<br />

between test items and important facets of the specialty being tested: 1)<br />

the specialty training standard, an Air Force document which identifies the<br />

primary duties and responsibilities in a specialty, 2) CODAP-based occupational<br />

analysis data collected by USAFOMS, and 3) the expertise of the SMEs<br />

themselves.<br />

Content validity is thoroughly documented for the more than 400 SKTs in the<br />

USAF inventory. However, the documentation is predominantly qualitative<br />

rather than quantitative in nature, as is the norm with tests based on this<br />

strategy. Test developers at USAFOMS felt that a quantitative means of assessing<br />

content validity would be useful to improve their feedback to test<br />

writers and to make it possible to study test resulfs longitudinally to help<br />

identify problem tests.<br />

Lawshe (19751 developed just such an approach, the first to focus on content<br />

validity in a quantitative, rather than a qualitative manner. His method<br />

called for a panel of subject-matter experts (SMEs) to independently rate<br />

test items using the following scale:<br />

Is the skill (or knowledge) measured by this test question:<br />

Essential (21,<br />

Useful but not essential Cl>, or<br />

- Not necessary (01,<br />

for successful performance on the job?<br />

He then used a test of statistical significance with the content validation<br />

panel’s ratings as a basis for eliminating items from a test item pool.<br />

Lawshe’s procedure has been applied by a number of investigators in a variety<br />

of situations. Distefano, Pryer, and Craig (1980) used his content valida-<br />

235<br />

.


-<br />

tion procedure to assess a job knowledge test being used as a criterion measurement<br />

of psychiatric aide training success. They stated, "It is evident<br />

that the content validity could be improved in subsequent revisions if the<br />

method is used as part of the test construction process.”<br />

Kesselman and Lopez (1979) developed an accounting job knowledge test using<br />

the procedure which they found to be superior to a commercially available<br />

mental ability instrument in predicting two criteria; supervisor assessment<br />

of subordinate job knowledge and supervisor assessment of overall job perfor-<br />

mance.<br />

Distefano, Pryer, and Erffmeyer (1983) showed that a variation of the Lawshe<br />

method could be used in the development of a behavioral rating scale of job<br />

performance, while providing quantitative content validity evidence of, the<br />

criterion scale.<br />

Finally, Carrier, Dalessio, and Brown (1990) used Lawshe’s three-point scale<br />

to focus on the correspondence between inferences made using content validity<br />

and criterion-related validity strategies. They found that for experienced<br />

candidates, job experts seemed to be able to identify those items on an interview<br />

guide that predicted the commissions of personnel into the Life Insurance<br />

Marketing and Research <strong>Association</strong>. They also noted that ". . .using<br />

content validity as the sole evidence for test validity should be limited to<br />

situations where test developers are working with well-defined constructs,<br />

such as acquired skills or specific knowledge.”<br />

Method<br />

Two content validity rating (CVR) forms were developed using a Lawshe-type<br />

scale, one form fo.r development of the SKT taken for promotion to E-5 and one<br />

for the development of the SKT taken for promotion to E-6 and E-7. The<br />

USAFOMS forms incorporated minor adjustments to the Lawshe approach. In par-<br />

titular, it was necessary to reference the grade level of the test, since<br />

knowledges required for successful performance of the E-5 duties may be considerably<br />

different from those required to perform E-6 and E-7 duties. In<br />

addition, the USAFOMS forms focused on successful performance in the specialty,<br />

not just in the job, a much broader view, since a specialty encompasses a<br />

family of related jobs.<br />

Whereas Lawshe used a test of statistical significance with the panel's ratings,<br />

this was not practical at USAFOMS because of the small number of individual<br />

raters being used. Only two to six SMEs normally participate in a<br />

test development project. To require statistical significance with such a<br />

small sample would require unanimous agreement of item essentiality. Rather<br />

than impose strict statistical criteria with the new ratings, the USAFOMS<br />

policy was stated as follows: YVRs will be used to encourage SMES to focus<br />

first on the appropriateness of test item content as it relates to successful<br />

performance in the specialty." SMEs were not required to take special actions<br />

as a result of the various ratings. In essence, SMEs could retain (ei-<br />

ther reuse on the next revision of the test or designate as an alternate) or<br />

deactivate (designate as unacceptable for reuse) an item without regard to<br />

their ratings on the CVR forms. There are, however, other requirements such<br />

as clear reference support and acceptable item statistics that must be met if<br />

an item is to be retained.<br />

This research examines how Lawshe's procedure, with the noted modifications)<br />

was employed at USAFOMS -- an organization whose promotion tests impact most<br />

236<br />

.


Air Force enlisted specialties. The first objective was to determine the<br />

extent to which SME ratings on the CVR forms impact subsequent identification<br />

of items as acceptable or unacceptable for reuse. The second is to determine<br />

how SMEs and project psychologists perceive the value and usefulness of the<br />

forms. Ninety-four SMEs, representing 25 AFSs, assigned to USAFOMS for SKT<br />

rewrite duties were asked to rate test items from their respective E-5 and<br />

E-6/7 grade-level SKTs using the CVR forms. USAFOMS test development proce-<br />

dures requires the completion of this step prior to the SMEs’ designation of<br />

an item as either acceptable for continued use on subsequent SKTs or as unac-<br />

ceptable for reuse. Once again, these test items were rated using Lawshe’s<br />

3-point scale. A rating of 2 was given to items whose content was essential.<br />

A rating of 1 was assigned to those items whose content was useful, but not<br />

essential, and a rating of 0 was assigned to those items whose content was<br />

not necessary for successful performance in the AFS. In all, 19,700 ra~tings .<br />

were obtained from 94 raters for 2 SKT levels (E-5 and E-617).<br />

Results<br />

Intraclass correlations for each of the 25 AFSs were computed to determine<br />

the interrater reliabilities for the group of SMEs from each specialty. All<br />

but two of the calculated values had p < .05. (The higher reliability values<br />

obtained seemed to be associated with the more technologically specialized<br />

fields where there is little room for variance of procedures across the<br />

Air Force, thus leading to more agreement among SMEs on items which test essential<br />

knowledge. Lower values seemed to be associated with broader specialties<br />

where there is more variance in day-to-day jobs performed and hence,<br />

less agreement and lower values of reliability.)<br />

The average CVR for items chosen as acceptable and for those designated for<br />

deactivation were also calculated for each test project. The mean CVR value<br />

for all deactivated items was 1.28 and the mean CVR value for all acceptable<br />

items was 1.43. These results conformed to our expectations that on the<br />

whole, items selected for deactivation would have lower content validity ratings<br />

than those chosen as acceptable. The average CVR value for all 19,700<br />

ratings obtained was 1.40. This average reflects the fact that on the whole,<br />

Air Force SKTs are viewed as being relatively high in content validity.<br />

To determine the actual impact, if any, of the content validity ratings on<br />

the subsequent identification of an item for reuse, a chi-square test of statistical<br />

significance was computed. The null hypothesis (H 1 for this test<br />

states that there is no difference between the proportion o? items selected<br />

as acceptable and unacceptable in each rating category. The alternative hypothesis<br />

(H 1 is that the distribution of items in each rating category differs<br />

from t%e hypothesized one. The results, as shown in Table 1, indicate<br />

significant differences between expected and observed values for acceptable<br />

and deactivated items in each rating category, with the largest differences<br />

occurring between ratings of 2 and 0. As shown, 203 more ratings of 0 were<br />

observed for deactivated items than was expected, while 234 more ratings of 2<br />

were observed for acceptable items than was expected. A chi-square value of<br />

199.7 (df=2) was obtained, indicating significance at the .Ol level. On<br />

this basis, the null hypothesis was rejected, indicating a disproportional<br />

representation of items selected as acceptable and unacceptable in each rating<br />

category. This shows that item content validity did impact subsequent<br />

identification of item acceptability.<br />

A point-biserial correlation coefficient relating identification of an item<br />

as either acceptable or deactivated with the item's average content validity<br />

237


ating was also computed. The resulting correlation coefficient, .0894, is<br />

significant at the .Ol level, yet is rather low. This can be attributed to<br />

the fact that of the 19,700 total ratings, 8,387 were ratings of 1. Since<br />

the largest differences in proportional representations were found to be in<br />

the rating categories of 0 and 2 and the number of ratings of 1 were within<br />

31 ratings of the expected value, it appears that the correlation coefficient<br />

may have been decreased by the large number of ratings of 1.<br />

Table 1. Comparison of Acceptable and Deactivated Test Items<br />

OBSERVED<br />

Acceptable Deactivated<br />

Rating 0 1213 515 1728<br />

Rating 1 6840 1547 8387<br />

Rating 2 8086 1499 9585<br />

16139 3561 19700<br />

-.-<br />

EXPECTED<br />

Acceptable Deactivated<br />

Rating 0 1415.6 312.4 1728<br />

Rating 1 6871 1516 8387<br />

Rating 2 7852.4 1732.6 9585<br />

16139 3561 17700<br />

X2 = 199.7 df = 2 PC.01<br />

Through the analysis of the data, a number of unusual cases surfaced. The<br />

data contained in these cases was contrary to expectations and evoked further<br />

analysis . For instance, 10 of the 5,200 test items evaluated were given rat-<br />

ings of 0 (not necessary) by all SMEs, yet still retained as acceptable<br />

items. Perhaps the SMEs reconsidered the item content and found it essential<br />

to successful job performance. Furthermore, 117 of the 940 deactivated items<br />

were given ratings of 2 (essential) by all SMEs. After obtaining the reasons<br />

for deactivation, these items were broken down into six categories:<br />

REASON FOR DEACTIVATION # ITEMS % OF TOTAL<br />

Poor Statistics/No Acceptable Revision<br />

Inadequate Reference<br />

Obsolete I tern<br />

Two or More (or No) Correct Answers<br />

Low Content Validity<br />

Inadvertent Duplication of Item<br />

57 49%<br />

37 32%<br />

11 9%<br />

8 7%<br />

3 3%<br />

1 1 .!<br />

As stated earlier, there are other requirements not directly related to content<br />

validity that must be met if an item is to be retained. These include<br />

238


clear reference support and acceptable item statistics, thus most of the 117<br />

items were deactivated for valid reasons. It was surprising, however, that<br />

3% of the items in question had "low content validity" cited on the i tern<br />

record card as the reason for deactivation when originally all SMEs had felt<br />

the item content was essential for successful job performance. These results<br />

could be due to administrative error, SME reconsideration of the item con-<br />

tent, or a number of other possible reasons.<br />

The second objective of this research was to determine how SMEs and project<br />

psychologists perceived the value and usefulness of the CVR forms. A three<br />

question survey was administered to 21 project psychologists at USAFOMS. A<br />

similar survey was administered to 151 SMEs upon completion of the rating<br />

forms and selection of items for deactivation. A four-point rating scale was<br />

used for the responses: strongly agree, agree, disagree, and strongly. dis-<br />

agree. The questions and summary of responses are as follows:<br />

(11 When selecting previously used items to be reused, the Content Validity<br />

Rating forms hglped identify those items most essential to successful<br />

performance in the specialty.<br />

49% (74) SMEs answered positively (agree or strongly agree)<br />

52% (111 project psychologists answered positively<br />

(2) The Content Validity Rating forms helped bring out different points<br />

of view for discussion.<br />

56% (85) SMEs answered positively<br />

76% (16) project psychologists answered positively<br />

(3) The Content Validity Rating forms were a valuable tool in selecting<br />

Discussion<br />

items to be reused.<br />

39% (59) SMEs answered positively<br />

43% (9) project psychologists answered positively<br />

Through the analysis previously described, it became apparent that there was<br />

a significant impact of content validity ratings on subsequent identification<br />

of an item as acceptable or deactivated. For instance, with the 25 projects<br />

sampled, there were 52 individual SKTs examined. Of these 52 SKTs, 37 had<br />

higher average CVR values for acceptable items than the average CVR values of<br />

the deactivated items which is what would be expected -- a positive differ-<br />

ence between the two. It is also important to note that 6 of the 52 SKTs are<br />

not applicable in this analysis since in these cases all 100 items were designated<br />

as acceptable and thus, there was no average CVR value for deactivated<br />

items. The 9 remaining SKTs had higher average CVR values for the deactivated<br />

items than the average CVR values for the acceptable items. Of these 9<br />

SKTs, 2 were from projects with insignificant interrater reliabilities. Even<br />

though the expected effect did not hold true for every case, overall, items<br />

higher in content’ validity had a greater chance of being acceptable while<br />

items lower in content validity were more likely to be deactivated. This<br />

shows that on the whole, item content validity does play a role in the SMEs’<br />

evaluation of an item's testworthiness.<br />

The second objective of the research was to determine how SMEs and project<br />

psychologists perceived the usefulness of the CVR forms. It became evident<br />

that there was no universal agreement on the usefulness of the forms. Additionally,<br />

any project psychologist biases, either for or against the use of<br />

the forms, may have influenced how the psychologist administered the forms to<br />

239<br />

. .


t h e SME,;. This in turn may have biased the SMES’ ratings on the CVR forms<br />

and on the surveys as we11.<br />

The results of this study suggested several areas for future research.<br />

First, to the extent that the forms are used, one would expect the content<br />

validity of SKTs to improve over time since ideally, the forms would help<br />

SMEs identify and retain those items most essential to successful performance<br />

in the specialty. This could be observed by charting the average content<br />

validity rating for all SKTs over a period of years.<br />

Also, with the imminent manpower cutbacks, the USAFOMS mission may be directly<br />

affected. By charting these average content validity rating values, it<br />

would be possible to see whether the content validity of the test% declined.<br />

This would be helpful in illustrating the impact of cutbacks on the test. development<br />

mission at USAFOMS.<br />

Finally, it would be interesting to examine the relationship between project<br />

psychologist and SME responses to the survey questions and to investigate the<br />

possibility of psychologist biases affecting the SMEs’ use of the forms.<br />

Although the attitudinal portion of the research showed some disagreement as<br />

to the value of this quantitative procedure, statistically, the content validity<br />

of the items has a significant impact on the subsequent identification<br />

of items for reuse.<br />

References<br />

Carrier, M. R., Dalessio, A. T., and Brown, S. H. (1990).<br />

Correspondence<br />

between estimates of content and criterion-related validity values. Personnel<br />

Psychology, 43, 85-100.<br />

Distefano, M. K., Jr., Pryer, M. W., and Craig, S. II. (1980). Job-relatedness<br />

of a posttraining job knowledge criterion used to assess validity and<br />

test fairness. Personnel Psychology, 33, 785-793.<br />

Applica-<br />

Distefano, M. K., Jr., Pryer, M. W., and Erffmeyer, R. C. (1983).<br />

tion of content validity methods to the development of a job-related performance<br />

rating criterion. Personnel Psychology, 36, 621-631.<br />

Fitzpatrick, A. R. (1983). The meaning of content validity. Applied Psychological<br />

Measurement, 7, 3-13.<br />

Kesselman, G. A. and Lopez, F. E. (1979). The impact of job analysis on em-<br />

ployment test validation for minority and nonminority accounting personnel.<br />

Personnel Psychology, 32, 91-108.<br />

Lawshe, C . H . (1975). A quantitative approach to content validity. Personnel<br />

Psychology, 28, 563-575.<br />

240<br />

.


Barbara Jezior<br />

U.S. Army<br />

Natick Research, Development, and Engineering Center<br />

Natick, MA.<br />

Interpreting Rating Scale Results:<br />

What does a Mean Mean?<br />

Larry Lesher<br />

GEO-CENTERS, Inc.<br />

Newton Centre, MA.<br />

Richard Popper<br />

Ocean Spray Cranberries Inc.<br />

Lakeville-Middleboro, MA.<br />

Charles Greene Vanessa Ince<br />

U.S. Amly * U.S. Army<br />

Natick Research, Development, ,and Engineering Center Natick Research, Development, and Engineering Center<br />

Natick, MA. Natick, MA.<br />

How well soldiers like items they use in garrison or the field is often measured on<br />

Likert scales, and the mean ratings obtained from these scales are then used as<br />

indicators of user acceptance. In examining data contributing to 176 mean ratings<br />

of various Natickproducts we found that the means accurately predict the acceptor<br />

set, i.e. the percentage of soldiers who rated a product on the positive end of a scale.<br />

Knowing the percentage who find a product acceptable provides a more intuitive<br />

and concrete basis for product development or improvement decisions. For<br />

example, the product developer can operate from the knowledge that 66% find the<br />

product acceptable, insteadof amean rating that deems the product “slightly good.”<br />

Introduction<br />

Natick is deeply involved in consumer acceptability<br />

issues. We develop basic subsistence items for servicemen<br />

- rations, protective clothing, shelters, and airdrop<br />

equipment. These products support m annual procurement<br />

of over 3 billion dollars, making consumer (soldier)<br />

acceptance critic‘al. Items that are unacceptable could sit<br />

in wnrehouses or never be used, and the soldier would be<br />

lacking necessary equipment as well.<br />

To obtain additional quantitative information on how<br />

soldiers felt about our products, we started a large-scale<br />

systematic survey program six years ago. Like many, we<br />

operatedunder the assumption that one of thebest ways to<br />

measure and describe how well the soldier liked the<br />

products was to use the mean and other parameters derived<br />

from verbal rating scales.<br />

After analyzing over 7,000 questionnaires throughout<br />

the six years and writing many reports for managers and<br />

product project officers webegan toquestion this assumption.<br />

We ourselves began to get curiousabout what means<br />

were saying in respect to the measure of product accepta-<br />

241<br />

bility. For instance, while we felt that a mean of 5 on a 7point<br />

scale shoul(l denote a relatively acceptable product,<br />

we found that we usually had many more negative ratings<br />

than expected. Over time we also began to feel, on an<br />

intuitive level, that a mean of 6 on a 7-point scale indicated<br />

a “very” good product but our verbal anchor was labelling<br />

such a product as “moderately” good.<br />

Moreover, in describing survey results to product<br />

managers, we found that while the concept of an average<br />

is rather commonly understood, the accompanying parameters<br />

of standard deviations, skewness, etc., are not<br />

understood outside the research communities, nor should<br />

we expect them to be. The problem here is that a mean in<br />

isolation, whichis what amanageris grappling with when<br />

not understanding its accompanying parameters, can be<br />

very misleading. A manager who makes product decisionswithout<br />

somesemeofwhataratingdistributionisall<br />

about may m‘ake the wrong decisions.<br />

Another problem with means for many is that they<br />

don’t provide a good intuitive feel for what relative<br />

differences are in regard to measuring products, any<br />

statistically significant differences notwithstanding. For<br />

instance, if means differ by one scale point. some don’t


think that difference especially disc!oncerting, while others<br />

thinkit’s monumental - ‘and those viewpointscanbeirrespective<br />

of whether there is an understanding of the<br />

underlying distribution or not.<br />

These issues led us to the literature to see what had<br />

been reported on rating scale distributions in respect to<br />

product acceptability.<br />

The literature showed that, in recent years, a new<br />

objective measure for determining level of product acceptability<br />

labelled the “acceptor set” has been described<br />

in marketing research literature, especially that of the food<br />

industry (Gordon and Norback, 1985). The measure has<br />

been used in conjunction with food product optimization<br />

techniques and market positioning (Lagrange and Norback,<br />

1987).<br />

While the acceptor set canbe determined by a simple<br />

binary method (dichotomous question), both Gordon<br />

(198s) and Choi and Kosikowski (1985) described creating<br />

an “acceptor set” from scaled data by splitting the<br />

sample group into two percentages, the percentage who<br />

found a product acceptable and those who did not. For<br />

example, respondents to our 7-point scale (l=“dislike<br />

verymuch”107=“like verymuch”)couldbesplit intotwo<br />

groups - tither the S-7 group, or the l-4 group, with the 5-<br />

7 group constituting the acceptor set. Product optimization<br />

then means finding methods to increase the acceptor<br />

set percentages (however derived) as measures ofproduct<br />

improvement.<br />

Given those findings, we decided to look at the acceptorset<br />

conceptinrespect tooursurveydatabasetoseel~ow<br />

we could add to the definition ofproduct acceptability for<br />

both the manager and researcher.<br />

Methodology<br />

Data base description<br />

The rating scale data were obtained on questionnaires<br />

administered to approximately 7,500 combat arms soldiers<br />

who rated products such as field rations, protective<br />

clothing, tents, and airdrop equipment. Data collectors<br />

went to the survey sites after these soldiers had returned<br />

from major training exercises where they had used one or<br />

more of the products. Entire units were tasked to participate<br />

in the surveys; soldiers in these units could refuse to<br />

fill out questionnaires if they chose, but few did. The<br />

sample size at each site r‘anged from 200 to 400. The<br />

soldier population was male, with over 90% between the<br />

agcsofbetween 19and 23 andservingintheenlistedranks<br />

E-2 to E-4.<br />

The verbal rating scales were either 7-point or Y-point<br />

scales. The 9-point scale, which has also been called a<br />

hedonic scale (Peryam and Pilgrim, 1957), has been a<br />

242<br />

scale traditionally used in military and civilian food research<br />

formore than 30 years (Maller and Cardello, 1984).<br />

Itwasonlyusedforrating the taste ofspecificration(food)<br />

items. Acceptability ratings for ration attributes other<br />

than taste (e.g. acceptability of portion sizes) were obtained<br />

on 7-point scales.<br />

The verbal anchors for the 7-point scales were: goodbad,<br />

satisfied-dissatisfied, easy-difficult, comfortableuncomfortable<br />

andlike-dislike. The 9-point scale anchors<br />

werelike-dislike. Eachscale had adverbmodifiers forthe<br />

anchors that graduated in intensity. For example, the 7point<br />

good-bad scale was: .<br />

VERY MODERATELY SLIGHILY NEtTtLER MAD SLKWTLY MUDEKATELY VERY<br />

BAD BAD BAD NOR CQOD CXWD GOOD GOOD<br />

1 2 3 4 5 6 7<br />

Each of the scales had a neutraI point and the positive<br />

verbal anchors for the scales were at the high ends, i.e. 5-<br />

7 for the 7-point scale and 6-9 for the g-point scale. The<br />

product acceptnnce issues covered a wide range of variablessuchasdurability,<br />

appearance,comfort, taste, weight,<br />

compatibility (with other pieces of equipment), weatherproofing,<br />

warmth, and “overall” acceptability.<br />

Analysis<br />

We based our ‘analysis on randomly selected mean<br />

ratings from our survey data. The number of means<br />

selected was 155 for the 7-point scale and 21 for the 9point.<br />

The largest sample contributing to any particular<br />

mean numbered 347, and the smallest 34. The lowest<br />

mean rating on the 7-point scale w;1s 2.94 and the highest<br />

was 6.53; the lowest for the 9-point was 3.01 ‘and the<br />

highest 6.35. The mean of the means obtained on the 7point<br />

scales was 4.71 (SD=.75), while the mean of the<br />

means obtained on the 9-point scale was 4.59 (SD=.9 1).<br />

The distributions for all the selected means were unimodal.<br />

We explored the relationships of the means to the size<br />

of the acceptor sets through regression analyses. The<br />

acceptor set definition was the percent of ratings falling in<br />

the entire positive range for either scale, i.e. S-7 for the 7point<br />

and 6-9 for the 9-point scale.<br />

Results<br />

The results show extremely good fits with linear<br />

regression models for both 7- and 9-point scales. Figure<br />

1 shows the scatter plot for the relationship of the means<br />

to the acceptor sets for the 7-point scale. The R* in this c,ase<br />

is .97 with a regression equation of:<br />

y = - 54.45 + 24.13x.


Figure 2’s scatterplot shows the relationship of me‘ans<br />

to acceptor set for the 9-point scale; the R1 is .98 and the<br />

regression line is:<br />

y = - 26.04 + 14.71x.<br />

Figure 1<br />

I 2 3 4 5 6 ‘I<br />

MEAN R4TmGS<br />

Figure 2<br />

MEAN RATINGS<br />

Discussion<br />

When we derive acceptor set sizes from the regression<br />

equations for both the 7- ‘and g-point scales, the size of the<br />

acceptor sets affirms what we were seeing in our data for<br />

iridividual products. For instance, the acceptor set for a<br />

mean of 5 on the 7-point scale corresponds to an acceptor<br />

set size of 66% of the population, whereas a mean of 6 corresponds<br />

with 90%. A far greater number of negative<br />

ratings are seen for a mean of 5 as opposed to a 6: the<br />

negative and neutral population is decreased by 25%<br />

between means of 5 and 6.<br />

As mentioned earlier, the 9-point scale bar been used<br />

243<br />

forratingfooditemsinmilitaryrationssince 1957. Senior<br />

researchers who have spent many years in ration acceptability<br />

feel sure they have a very good item if a rating is a<br />

7 (“like moderately”). That is, the 7 is not a good rating<br />

by default or relative stature of the item in the ratings list,<br />

the feeling is that an item with a rating of 7 is very<br />

acceptable in an absolute sense. Correspondingly, the<br />

acceptor set picture for the g-point scale regression is: a<br />

mean rating of 7 shows an acceptor se1 size of 77%, 7.5<br />

shows 84%, and 8 shows 91%.<br />

The situations described above point to how acceptor<br />

sets can aid the defmition of product acceptability. If we<br />

unshackle the description of product acceptance from the -<br />

scale verbal anchors, which can make it appear that products<br />

‘are falling somewhat short because they are not<br />

achieving perfect scores, it may facilitate definition of<br />

product norms that are easier to deal with both intellectually<br />

and at gut level.<br />

For instance, if you tell product developers that a<br />

product is top of the line if 90% of the populace rates it<br />

positively, the statement has an intuitive logic to it. Product<br />

developers assume that no one product can please<br />

lOO%ofthepopulation. Evenifthereweresuchaproduct,<br />

it would still probably not achieve a perfect rating on any<br />

scaled measure because there is a lot at play in the rating<br />

game, e.g., raters tend to avoid end points on scales no<br />

matter how they feel about a product, frames of reference<br />

can be different among raters in regard to a product, and<br />

even the mood the rater is in that day can affect his or her<br />

rating.<br />

What the norms for products should be, as defined by<br />

the size of the acceptor set, i.e., excellent, good, average,<br />

or poor, are to be determined. One approach might be to<br />

determine the cumulative distribution frequencies for<br />

acceptor sets and think in terms of percentiles. Figure 3<br />

shows the application of this concept to the 7-point scale<br />

data; the graph shows that an acceptor set of 45% falls in<br />

the 25th percentile, while a set of 74% falls in the 75th<br />

percentile. To achieve a product that scores better than<br />

80% of all products tested, an acceptor set size of about<br />

Figure 3<br />

0 25 50 75 ,,m<br />

CIJMULAINE I’ERCEh’T


77% is needed. Other qualitative or quantitative data<br />

could also be used in conjunction with acceptor set size to<br />

establish breakpoint criteria.<br />

Going one step further, nomrs could, and should, be<br />

established for different product groups. Some types of<br />

products by their nature will never have large acceptor<br />

sets. so they should not have to be measured against<br />

products that do.<br />

Our findings obviously reinforce the research attesting<br />

to the value of the acceptor set to managers in the<br />

commercial world concerned with market positioning and<br />

product optimization. The military worldcan alsobenefit.<br />

Although our consumeris a captive consumer so to speak,<br />

there may be some bottom line applications of the acceptor<br />

set that me,ans c,an’t address.<br />

For instance, the U.S. Army can spend ‘around<br />

$3 I ,OOO,OOO a year on its standard operational ration. For<br />

the sake of hypothesis. assume those who didn’t like it<br />

didn’t eat it. What would that mean in terms of dollars’! If<br />

you were to assume further the ration overall had a mean<br />

rating of 5 (7-point scale), 66% would be eating it, and if<br />

it had a mean rating of 6,90% would be eating it. The<br />

differential of that one scale point amounts to $7,440,000<br />

in uneaten rations. This type of accountability would<br />

behoove the developer to improve acceptability in a way<br />

that simply looking at means couldn’t.<br />

Overall, we recommend using an acceptor set to<br />

communicate levels of acceptability and to use this measure<br />

in tandem with traditional scale statistics. Scale<br />

pammeters still convey infomlation that dichotomous or<br />

otherqualitativedatacannot. Forthemanager, however,<br />

the acceptor set will provide a far more intuitive grasp of<br />

the product findings and a firmer footing for product<br />

244<br />

optimization or market positioning decisions.<br />

The excellent fit of the linear regression mode1 is<br />

especially gratifying because of the simplicity it offers.<br />

The acceptor set grows linearly with product acceptance<br />

means. One integer of improvement in a mean translates<br />

into a constant percent change in the acceptor set, namely<br />

about 24% on a 7-point scale.<br />

References<br />

Choi, H.S., and Kosikowski, F.V. (1985). Sweetened<br />

plain &and flavored c,arbonated yogurt beverages.<br />

Journal ofDairy Science,68,913.<br />

Gordon, N.M. (1985). A product development, positioning<br />

and optimization process guided by organizational<br />

objectives. Mnsterof ScienceThesis. University<br />

of Wisconsin, Madison.<br />

Gordon,N.M. andNorback, J.P. (1985). Choosingobjective<br />

measures when using sensory methods for optimization<br />

and product positioning. Food Technology,<br />

39,( 1 1), 96.<br />

Lagrange, V. and Norback, J.P. (1987). Product optimization<br />

and the acceptor set size. Journal of Sensory<br />

Studies, 2, 119-136.<br />

Mailer, 0. and Cardello, A. (,1984X Ration acceptance<br />

methods: measuring likes and their consequences.<br />

Niederlands Militari Geneeskundig<br />

Tijdschrift.,37(79/1 lo), 91-96.<br />

Peryam, D.R. and Pilgrim, F. (1957). Hedonic scale<br />

method of measuring food preferences. Food Technology,<br />

11(9), Supplement 9.


---.---- ---._. -..<br />

\<br />

ASVAB, Description<br />

Joint-Service Computerized Aptitude <strong>Testing</strong><br />

W. A. Sands*<br />

Director, <strong>Testing</strong> Systems Department<br />

Navy Personnel Research and Development Center<br />

San Diego, California 921526800<br />

INTRODUCTION<br />

The Armed Services Vocational Aptitude Battery (ASVAB) is used by all the U.S.<br />

military services for both enlistment screening and classification into entry-level<br />

training. The current battery includes ten tests. The eight power tests are: General<br />

Science, Arithmetic Reasoning, Word Knowledge, Paragraph Comprehension, Auto<br />

and Shop Information, Mathematics Knowledge, Mechanical Comprehension, and<br />

Electronics Information. The two speeded tests are: Numerical Operations and Coding<br />

Speed. Administration of this conventional, paper-and-pencil test battery takes<br />

between 3 and 3 l/2 hours.<br />

The U.S <strong>Military</strong> Entrance Processing Command (USMEPCOM) administers<br />

ASVAB under two Department of Defense testing programs. In the Enlistment<br />

<strong>Testing</strong> Program, ASVAB is administered to over 800,000 applicants each year, in<br />

approximately 70 <strong>Military</strong> Entrance Processing Stations (MEPS) and 970 Mobile<br />

Examining Team Sites (METS) nationwide. In the Student <strong>Testing</strong> Program, ASVAB is<br />

administered to over l,OOO,OOO students annually, in over 15,000 schools.<br />

CAT-ASVAB Program<br />

Roles.. The U. S. Department of Defense initiated a Joint-Service research<br />

program to develop a Computerized Adaptive <strong>Testing</strong> (CAT) version of the battery<br />

(CAT-ASVAB) in FY 1979. At that time, the Department of the Navy was designated as<br />

Executive Agent, with the Marine Corps as Lead Service. Subsequently, the Lead<br />

Service responsibility was assigned to the Navy. The Navy Personnnel Research and<br />

Development Center (NPRDC) was designated as the Lead R&D Laboratory. The Air<br />

Force was assigned responsibility for the development of the large banks of test items<br />

needed for CAT-ASVAB. The Army was assigned responsibility for the procurement,<br />

deployment, and implementation of the full-scale operational testing system.<br />

Q&ectives. The Joint-Service CAT-ASVAB Program has three objectives: (1)<br />

develop a CAT version of the ASVAB, (2) develop a computer-based delivery system<br />

that will support the new test battery, and (3) evaluate CAT-ASVAB as a potential<br />

replacement for the paper-and-pencil version of the battery (P&P-ASVAB).<br />

* The opinions expressed in this paper are those of the author, are not official, and do not<br />

necessarily represent those of the Navy Department.<br />

245


Purpose<br />

ACCELERATED CAT-ASVAB PROJECT<br />

The Accelerated CAT-ASVAB Project (ACAP) is designed to develop and field-test<br />

CAT-ASVAB in the shortest time possible. The idea is to collect “lessons learned”<br />

information about the new test battery, the delivery system (including both<br />

hardware and software), and the testing environment. The information obtained will<br />

be used to specify the functional requirements for the full-scale, operational system.<br />

Delivery System<br />

The Hewlett-Packard Integral Personal Computer (HP-IPC) was selected for the<br />

ACAP System. The HP-IPC is a powerful microcomputer system which, for the<br />

examinee testing station, includes: a Motorola 68000, 16/32 bit microprocessor; a<br />

graphics co-processor; 1.5 megabytes of Random Access Memory (RAM); a built-in,<br />

3.5-inch, 7 10K byte microfloppy disk drive; a g-inch amber, high-contrast<br />

electroluminescent flat screen display, supporting a 255 by 512 pixel, bit-mapped<br />

display, and a standard window size of 24 lines by 80 characters; a go-key, lowprofile,<br />

detachable keyboard (which was modified by a template, leaving only the<br />

necessary keys exposed); and a built-in inkjet printer. The entire computer has a 7<br />

by 16 inch footprint, requiring less than one square foot of desk space. The software,<br />

developed at NPRDC, is written in the C programming language, running under a<br />

UNIX operating system (HP-UX).<br />

Field Research Activities<br />

The Accelerated CAT-ASVAB Project (ACAP) involves six field research<br />

activities:<br />

Pre-Test. The purpose of this research was to insure that examinees could<br />

easily use the CAT-ASVAB System (including hardware and software). <strong>Military</strong><br />

recruits and students from high school special education classes were administered<br />

CAT-ASVAB. In the aggregate, they represented the full range of mental ability.<br />

Results were very encouraging. The examinees found CAT-ASVAB easier and faster<br />

than paper-and-pencil tests that they had taken. They liked the fact that it was selfpaced,<br />

and involved little writing. Some examinees expressed concern that they<br />

could not skip over items, nor go back to previous items and change their answers.<br />

Some examinees indicated that their eyes became tired, which emphasized the<br />

importance of avoiding glare on the screens. Administration instructions were<br />

revised, based upon information from questionnaires and interviews. This revision<br />

reduced the reading grade level from the eighth to the sixth grade. The Pre-Test was<br />

completed in November 1986.<br />

Medium of Administration. The purpose of this research was to evaluate the<br />

effect of the calibration medium of administration on score precision. The subjects<br />

were from the Navy Recruit Training Center in San Diego. Forty-item conventional<br />

tests were constructed for General Science, Arithmetic Reasoning, Word Knowledge,<br />

Shop Information, and Paragraph Comprehension. Subjects were randomly assigned<br />

to one of three groups. The first group took the tests on computer; these data were<br />

used to obtain a computer-based calibration of items. The second group took the same<br />

246<br />

.


tests in a paper-and-pencil mode; these data were used to obtain paper-and-pencil<br />

calibration information. Each of these calibrations was used to estimate the ability of<br />

examinees assigned to the third group, who took the tests on computer. Lengthy test<br />

administration time required splitting the study into two phases. General Science,<br />

Arithmetic Reasoning, Word Knowledge, and Shop Information were addressed in the<br />

first phase. Results from this phase showed: (a) no practical differences in the<br />

estimation of abilities; (b) small, but statistically significant differences in different<br />

tests; and, (c) no significant differences in test reliabilities. The second phase<br />

involved the administration of the Paragraph Comprehension test. Data for this<br />

second phase have been collected and anaIyses are underway.<br />

Cross-Correlation. The purpose of this research was to compare the<br />

measurement precision of CAT-ASVAB and P&P-ASVAB. Subjects were from the ‘Navy<br />

Recruit Training Center, San Diego. Each recruit had taken an operational form of<br />

P&P-ASVAB which was used for enlistment purposes. The total sample was split into<br />

two groups. The first group took CAT-ASVAB Form 1. then CAT-ASVAB Form 2. The<br />

second group took P&P-ASVAB Form 9B, then P&P-ASVAB Form 10B. The second test<br />

for each group was administered about five weeks after the first test. Results indicate<br />

that, despite using substantially fewer items, CAT-ASVAB exhibits significantly<br />

higher alternate form reliability than P&P-ASVAB for most tests, while no P&P-<br />

ASVAB test demonstrates significantly higher reliability than the comparable CAT-<br />

ASVAB test.<br />

Preliminarv Ooerational Check. The purpose of this research was to<br />

demonstrate the communications interface between the ACAP System and USMEPCOM<br />

computer system. The testing procedures were performed jointly by NPRDC and<br />

USMEPCOM personnel at the Seattle MEPS. Data from examinees were loaded onto the<br />

Data Handling Computer at the MEPS, then transferred to the USMEPCOM System-80<br />

minicomputer. Comparison of the data before and after the transfer showed the<br />

procedure was completed with perfect accuracy.<br />

Sco e Eauat’ p Development. The purpose of this research was to equate CAT-<br />

ASVAB with P&P-:SVAB. Equating is essential to insure that the two forms of the<br />

battery are on the same metric, and that the scores are interchangeable. Subjects<br />

were applicants for enlistment at six MEPS (San Diego, Richmond, Seattle, Boston,<br />

Omaha, and Jackson) and their satellite MET sites. These six MEPZYMETS complexes<br />

were selected because, in the aggregate, their applicants are representative of the<br />

nation. The operational measures included P&P-ASVAB Forms lOA, lOB, 11 A, llB, 13A,<br />

and 13B. There were two forms of the CAT-ASVAB (both non-operational). Finally,<br />

P&P-ASVAB Form 8A was used as the non-operational reference battery. Subjects<br />

were randomly assigned to one of three groups. The first group took CAT-ASVAB<br />

Form 1, then the operational P&P-ASVAB. The second group took CAT-ASVAB Form 2,<br />

then the operational P&P-ASVAB. The last group took the reference battery (P&P-<br />

ASVAB Form 8A), then the operational P&P-ASVAB. In each case, the testing was<br />

done on the same day or on successive days. Data collection, editing, and equating<br />

analyses have been completed. New equating procedures have been developed and<br />

applied. Analyses indicated that composite equatings were unnecessary. ‘Provisional<br />

equating tables for operational use in<br />

the subsequent Score Equating Verification<br />

study were developed. The ACAP microcomputer delivery system has performed<br />

satisfactorily, exhibiting fewer problems than anticipated. Finally, the logistics of<br />

testing in the numerous, heterogeneous MEPS/MET sites nationwide has presented no<br />

insurmountable problems.<br />

247


Score Eauating Verification. The Score Equating Verification study is designed<br />

to evaluate the effect of examinee motivation upon item calibration and equating.<br />

The examinees are applicants for military service who are processing through the<br />

same six MEPQMETS complexes used in the Score Equating Development study.<br />

measures include two forms of CAT-ASVAB and one form of P&P-ASVAB (8A). The<br />

CAT-ASVAB scores are based on the provisional equating tables developed in the<br />

Score Equating Development study, and count as scores of record for enlistment. Data<br />

collected during this study will be used to develop final equating tables for<br />

subsequent operational use. Data collection began on 3 September 1990. This was the<br />

first time that CAT-ASVAB test results have counted as scores of record and, therefore,<br />

determined enlistment eligibility and subsequent training opportunities for<br />

applicants to the military services. Plans call for data collection to be completed in<br />

June 1992, analyses to be completed in November 1992, and results documented by<br />

May 1993.<br />

Technical Base R&D<br />

ENHANCED CAT-ASVAB<br />

During the past several years, each of the Service R&D laboratories has been<br />

investigating computer-administered tests which measure abilities not measured by<br />

the current ASVAB tests. These include measures of psychomotor ability, spatial<br />

ability, and working memory.<br />

Technical Advisory Selection Panel<br />

A Joint-Service Technical Advisory Selection Panel (TASP) was established to<br />

evaluate new computerized tests which showed promise and to nominate <strong>Military</strong><br />

Occupational Specialities (MOSS) for a Joint-Service Enhanced CAT-ASVAB @CAT)<br />

validation study. This committee was chaired by a representative of the Defense<br />

Manpower Data Center (DMDC) and included a technical representative from each of<br />

the Services and USMEPCOM. General criteria employed in evaluating the alternative<br />

tests included the theoretical development of the underlying construct, measurement<br />

precision, validity, equating, and operational feasibility.<br />

Joint Service ECAT Validation Study<br />

The TASP recognized that the amount of testing time available in the field was<br />

limited, and that not all promising tests could be administered. Therefore, the tests<br />

are grouped into primary and secondary categories. The primary group, to be<br />

administered to all examinees, includes: (1) Integrating Details, (2) Target<br />

Identification, (3) Figural Reasoning, (4) Two-Hand Tracking, and (5) Sequential<br />

Memory. The secondary group includes: (1) Assembling Objects, (2) Orientation, (3)<br />

One-Hand Tracking, and (4) Mental Counters. These secondary tests will be<br />

administered only in those situations where time permits.<br />

The <strong>Military</strong> Occupational Specialties (MOSS) involved in the Army include:<br />

Infantryman (1 lH), Cannon Crewman (13F), and Tank Crewman (19K). Air Force jobs<br />

include: Air Traffic Controller (27230) and Personnel Specialist (73230). The Marine<br />

248<br />

The


Corps MOSS will include: Motor Transportation (35Xx). and Aircraft Maintenance<br />

(61Xx). Finally, the Navy ratings will include: Air Traffic Controller (AC),<br />

Operations Specialist (OS), Fire Controlman (FC), Electronics Technician/Advanced<br />

Electronics Field (ET (AEF)), Radioman (RM), Engineman (EN), Aviation Structural<br />

Mechanic - Structures (AMS), Aviation Electrician’s Mate (AE), Aviation Electronics<br />

Technician (AT), Aviation Fire Control Technician (AQ), Aviation Antisubmarine<br />

Warfare Technician (AX), Aviation Ordnanceman (AO), Gunner’s Mate - Phase I<br />

(GMG), Machinist’s Mate (MM), and Electrician’s Mate (EM).<br />

Data collection for the Joint-Service ECAT Validation study began in February<br />

1990 and will continue through August 1991, with analyses, documentation, and<br />

reviews scheduled for completion in July 1992.<br />

Navy ECAT Validation Study<br />

The purpose of this study is to determine the incremental validity of new<br />

predictor tests for augmenting ASVAB for selected Navy ratings. It will provide<br />

additional information to the Joint-Service ECAT Validation study described above for<br />

assessing the cost-effectiveness of computerized testing.<br />

The experimental test battery includes six tests, followed by a seven-item<br />

questionnaire. Average administration time is 2 l/2 hours. The six tests are: (1)<br />

Mental Counters, (2) Sequential Memory, (3) Integrating Details, (4) Space<br />

Perception, (5) Spatial Reasoning, and (6) Perceptual Speed. The short questionnaire<br />

is designed to obtain information on examinee fatigue, motivation, and computer<br />

experience.<br />

Data have been collected from the following Navy schools: Operations<br />

Specialist (OS), Aviation Structural Mechanic - Structures (AMS), Aviation<br />

Ordnanceman (AO), Aviation Electronics Technician (AT), Aviation Fire Control<br />

Technican (AQ), Aviation Antisubmarine Warfare Technician (AX), Gunner’s Mate -<br />

Phase I (GMG), Machinist’s Mate (MM), Propulsion Engineering Basics, Aviation<br />

Machinist’s Mate (AD), Boiler Technician (BT), Hospitalman (HM), and Hull<br />

Maintenance Technician (HT).<br />

Data collection for the Navy ECAT Validation study has been completed.<br />

Analyses, documentation, and review of the results should be completed in December<br />

1990.<br />

CONCEPT OF OPERATIONS<br />

The concept of operations for the CAT-ASVAB System has not been finalized.<br />

In a previous study, four alternative deployment strategies were selected for special<br />

attention: (1) Centralized CAT-ASVAB testing at MEPS, with elimination of all METS<br />

testing; (2) High Volume Site <strong>Testing</strong> (all MEPS and 273 METS); (3) use of a CAT<br />

screening instrument at the military recruiting stations, with subsequent full CAT-<br />

ASVAB testing of screened personnel at MEPS, and (4) administration of CAT-ASVAB<br />

in mobile vans, testing at MEPS and fifty high-volume METS. The current<br />

operational scenario involving the administration of P&P-ASVAB in all MEPS and<br />

METS provided a baseline case for comparison purposes.<br />

249


ECONOMIC ANALYSES<br />

Department of Defense and Department of the Navy regulations require<br />

performing an economic analysis to assist in determining whether or not a system<br />

is cost-effective. An initial study was conducted by a contractor, whose<br />

representatives visited each of the MEPS in the continental United States to collect<br />

cost information in four areas: (1) development, (2) procurement, (3)<br />

implementation, and (4) operations and support.<br />

The Brogden-Cronbach-Gleser approach to test utility evaluation was<br />

employed. This approach assesses the dollar utility of the incremental validity of a<br />

new instrument (e.g., CAT-ASVAB) over the validity of an existing instrument ‘(e.g.,<br />

P&P-ASVAB) in terms of improved performance. A conventional ten-year economic<br />

life was used, and the net life cycle benefit computed for each alternative concept of<br />

operation. The incremental validity used for CAT-ASVAB was 0.002, a conservative<br />

estimate based upon simulation results assessing the increased precision of CAT-<br />

ASVAB over P&P-ASVAB. The results appear promising for two concepts of operation:<br />

centralized testing, and the recruiter screening approach. The high-volume site and<br />

mobile van concepts were not cost-effective.<br />

A pivotal issue in these economic analyses is the actual increment in validity<br />

which can be expected by using CAT-ASVAB instead of P&P-ASVAB. While simulation<br />

results were adequate for initial analyses, empirical data are necessary for any<br />

conclusive evaluation. Therefore the Manpower Accession Policy Steering<br />

Committee (MAPSC) instructed the Executive Agent to evaluate new tests which<br />

offered significant promise for enhancing ‘the predictive effectiveness of the<br />

current battery.<br />

A final economic analysis study will bc performed under contract. Results<br />

from this study will be crucial in determining whether or not CAT-ASVAB should be<br />

implemented nationwide.<br />

250


ASSESSMENT OF APTITUDE REQUIREMENTS FOR<br />

NEW OR MODIFIED SYSTEMS<br />

Lawrence H. O’Brien<br />

Dynamics Research Corporation<br />

Wilmington, Ma.<br />

INTRODUCTION<br />

Recent Department of Defense initiatives on manpower. personnel, and training<br />

call for an assessment of the “aptitude requirements of new systems.” For example,<br />

AR 602-2, Manpower and Personnel Integration (MANPRINT) in the Materiel<br />

Acquisition Process, requires that “For material with a predominant human<br />

interface, it is critical to collect and evaluate human performance reliability data to<br />

determine whether the proposed system concept will deliver expected<br />

performance with no greater aptitudes and no more training than planned.” DOD<br />

Directive 5000.53, Manpower, Personnel, Training, and Safety in the Defense<br />

Acquisition Process, requires that descriptions of the “quality and quantity of<br />

military personnel” needed to field a system be developed and updated during the<br />

acquisition process. The directive indicates that the descriptions of military<br />

personnel quality requirements “shall include distributions of skill, grade, aptitude,<br />

anthropometric and/or physical attributes, education, and training backgrounds.”<br />

KEY QUESTIONS RELATED TO APTITUDE ASSESSMENT FOR NEW SYSTEMS<br />

Aptitude assessments for new weapon systems seek to address two basic questions:<br />

Question I: Can the svstem be successfullv onerated and maintained bv the soldiers<br />

WI<br />

To determine if the system is successful, one must (a) identify the functions that<br />

the system is supposed to perform, (b) identify the measures that can be used to<br />

assess performance on these functions, (c) establish criteria for these measures,<br />

(d) either collect “test’ data on or estimate system performance, and (e) compare<br />

the system performance with the criteria. If performance exceeds the criteria, the<br />

system is judged successful.<br />

The term “by the soldiers who are expected to man it” implies detailed<br />

consideration of soldier characteristics such as aptitudes. More specifically, it<br />

assumes that data will be obtained from soldiers who are “representative” of the<br />

soldiers who will actually be assigned to t.he system.<br />

To identify “representative” soldiers, one must first identify the key personnel<br />

characteristics which impact soldier performance. Aptitudes such as scores on<br />

Armed Services Vocational Aptitude Battery are especially important because they<br />

are used by the Army to control entry into the Army or MOS. This is accomplished<br />

by setting cut-offs or minimum acceptable scores on these characteristics.<br />

The best way to select “representative” soldiers for inclusion in system testing is to<br />

randomly sample from a population that has the same distribution of these<br />

characteristics as the population of soIdiers who are expected to man the system.<br />

However, future dislribution of these characteristics within a particular MOS may<br />

251


e different than their current distribution. Since most Army systems take 5- 10<br />

years to develop, the capability to estimate the future distribution of key personnel<br />

characteristics is a critical prerequisite for describing the soldiers who are likely to<br />

be available .to man the system.<br />

Estimating the future distributions of these aptitudes is not simple since these<br />

distributions are impacted by a number of factors. First, the distributions are<br />

impacted by the cutoffs that the Army sets for these aptitude measures. These<br />

cutoffs eliminate soldiers who score below the cutoffs both from accessions and<br />

from distributions of the aptitudes at higher paygrades. However, the cutoffs are<br />

not the only factors determining these distributions. The distributions are also<br />

impacted by the distribution of the aptitudes in different subpopulations of the<br />

general population at a particular point in time, the propensity of those<br />

subpopulations to enlist at various aptitude levels, and the rates (e.g.. reenlistment<br />

rates) with which the subpopulations transition through the Army personnel<br />

system.<br />

I<br />

Question 2: Can the svstem be operated and maintained within available<br />

mannower. personnel, and training resource constraints?<br />

This question seeks to assess the “personnel affordability” of the new system. The<br />

resource capabilities of the Army are limited. The total end strength of the Army is<br />

fixed annually by Congress. The Army’s capability to recruit high quality personnel<br />

is restricted by the recruiting budget. To effectively deal with these resource<br />

limitations, the Army must set constraints for critical resources such as personnel.<br />

During the acquisition process, resource requirements for the new system must be<br />

established and compared with the constraints. If the requirements do not exceed<br />

the constrainls, the system is affordable; otherwise, it is not. Manpower constraints<br />

describe the maximum number of people who will be available to man the new<br />

system. Personnel constraints describe: (a) expected cutoff values for key<br />

characteristics such as aptitude, and (b) the expected distribution of these<br />

characteristics above the cutoff.<br />

APPROACH FOR ASSESSING APTITUDE IMPACTS ON SYSTEM PERFORMANCE<br />

The relationship between aptitudes and system performance is not a direct one.<br />

Aptitudes impact the performance of the tasks required to operate or maintain the<br />

system. Performance on these individual tasks determines overall system<br />

performance. Assessing the relationships between the performance of individual<br />

system tasks and system performance requires consideration of the complex causal<br />

and sequential relationships among tasks. Task performance will vary as a function<br />

of the conditions under which the tasks will be performed. These conditions will<br />

vary across time and across scenarios.<br />

Measures of Svstem Performance,. A number of metrics can be used to quantify<br />

system performance. Typically, two types of measures are developed: operational<br />

effectiveness (e.g. mission performance time or success) and system availability<br />

(e.g. system reliability, availability, or maintainability).<br />

252


I<br />

AFI’ITUDE ASSESSMENT TOOLS FOR NEW SYSTEMS<br />

As part of the HARDMAN 111 program, the Army Research Institute (ARI) has<br />

developed two microcomputer-based tools that can be used to assist Army analysts<br />

in identifying aptitude requirements and constraints for new systems--the<br />

Personnel Constraints Aid or P-CON and the Personnel-Based System Evaluation<br />

Aid or PERSEVAL. 1<br />

P-CON. estimates personnel quality constraints. More specifically, the P-CON Aid<br />

estimates the future distribution of key personnel characteristics. These<br />

distributions describe the numbers and percentage of personnel that will be<br />

available at each level of the personnel characteristics. The P-CON Aid also<br />

provides guidance to help Army analysts and contractors understand the impacis of<br />

setting constraints at different personnel characteristic levels. For example, the<br />

P-CON Aid will display the levels of performance that can be expected at each of<br />

these levels. An analyst can us’e the information on expected performance to set<br />

personnel constraint levels for each characteristic.<br />

The P-CON Aid first estimates what the future distribution of the personnel<br />

characteristics will be. Then, it uses results from analyses of the Project A data<br />

base to show what levels of performance are achievable at different characteristic<br />

levels. The user may then use the information on both personnel availability and<br />

performance to identify minimum acceptable levels for each personnel<br />

characteristic.<br />

PER-SEVAL. The PER-SEVAL Aid determines what level of personnel<br />

characteristics is needed to meet system performance requirements given a<br />

particular contractor’s design, fixed amounts of training, and the specific<br />

conditions of performance under which the system tasks will be performed.<br />

The PER-SEVAL Aid has three -basic components. First, PER-SEVAL has a set of<br />

performance shaping functions that predict performance as a function of ASVAB<br />

area composite and training. Separate functions are provided for different types of<br />

tasks. The primary data source for developing the functions were results from a<br />

regression analyses from the Project A data base. Second, the PER-SEVAL Aid has<br />

a set of stressor degradation algorithms that degrade performance to reflect the<br />

presence of critical environmental stressors. Third, the PER-SEVAL Aid has a set<br />

of operator and maintainer models that aggregate the performance estimates of<br />

individual tasks and produce estimates of system performance.<br />

RECONCILING THE JOB-BASED SYSTEM AND SYSTEM-BASED APPROACH TO<br />

APTITUDE REQUIREMENTS ASSESSMENT<br />

Assessment of aptitude requirements requires consideration of the impact of<br />

aptitudes on “performance”. The personnel and the system development<br />

communities have different conceptualizations of performance. The personnel<br />

community tends to focus on “job performance” while the system development<br />

1 HARDMAN III is a major developmental effort of ARI’s System Research<br />

Laboratory. Its objective is to develop a set of automated aids to assist Army analysts<br />

in conducting MANPRINT assessments during the Materiel Acquisition Process<br />

(MAP).<br />

253<br />

.


community tends to focus on “system performance” Traditionally, most of the<br />

previous’work on assessing aptitude requirements has been based on the job<br />

performance perspective. Yet, aptitude requirements (i.e. ASVAE3 area composite<br />

cutoffs] are set for occupational specialties not weapon systems. The tasks<br />

associated with a particular weapon system may only constitute a subset of the total<br />

amount of tasks assigned to a particular occupational specialty.<br />

Figure 1 displays a strategy for linking the job-based and system-based approaches<br />

for aptitude assessment. Prior to the development of the new system, the<br />

personnel community will set a ASVAl3 area composite for each MOS. It is assumed<br />

that the process for setting this cut-off will include consideration of the impact of<br />

the cut-off on “job performance.” During the system development process, the,P-<br />

CON and PER-SEVAL tools can be applied to determine the impact of the cut-off on<br />

system performance. P-CON can be used to project what the future distribution of<br />

personnel will be at or above the cut-off and PER-SEVAL can be used to determine<br />

if this populatitin can successfiilly meet system performance requirements. If<br />

system performance is adequate, no change in aptitude cut-off is needed. If system<br />

performance is not adequate, the possibility of using higher cut-offs can be<br />

examined. The P-CON tool can be used to examine the impact of higher cut-offs on<br />

personnel availability (i.e, the numbers of people at or above the cut-off). P-CON<br />

outputs can be used to assess the impact of personnel availability on the Army’s<br />

ability to provide the manpower to successfully man the new system. Another ARI<br />

tool, the Army Manpower Cost System or AMCOS. can be used to assess the<br />

personnel costs associated with recruiting higher aptitude personnel. The<br />

information on system performance, personnel availability, and personnel costs can<br />

then be used by the personnel community in reassessing the MOS cut-off. It is<br />

assumed that this assessment will consider the impact of the aptitude change on<br />

total “job performance.”<br />

Personnel<br />

Community<br />

LSets MOS<br />

Cut-Off<br />

‘J<br />

pact of<br />

riig her<br />

. . . . . . ..-I..- -n<br />

AVellaOlllly<br />

and Cost<br />

I V I<br />

t<br />

Figure 1. Potential relationships between job and system perspectives<br />

254<br />

Personnel<br />

Community<br />

Reassess<br />

Cut-OH


Currently. most personnel psychologists view job performance as multidimensional<br />

construct. For example, using data obtained from the Project A study, Campell,<br />

McHenry. and Wise (1990) have developed a model of Army job performance that<br />

has five factors: core technical proficiency, general soldier proficiency, effort and<br />

leadership, personal discipline, and physical fitness and military bearing. Clearly,<br />

performance on system performance tasks is closely related to one of these<br />

components--technical proficiency. As Sadacca, Campell. Difazio. Schultz, and<br />

White (1990) have pointed out the utility of the different job components may vary<br />

across jobs. The need to raise a particular MOS ASVAB cut-off will depend on how<br />

much importance Army decision makers attach to technical proficiency vice the<br />

other job components for the particular MOS being investigated.<br />

’ REFERENCES<br />

Army Regulation 602-2, 19 April 1990, Manpower and Personnel Integration<br />

(MANPRINT) in the Materiel Acquisition Process<br />

Campell JP, McHenry JJ, and Wise LL. (1990) Modeling job performance in a<br />

population of jobs. PERSONNEL PSYCHOLOGY, 43, 313-333.<br />

DOD Directive 5000.53, 30 December 1988, Manpower, Personnel, Training, and<br />

Safety in the Defense Acquisition Process<br />

Sadacca R, Campell JP. Difazio AS, Schultz SR. and White LA (1990). Scaling<br />

performance utility to enhance selection/classification decisions. PERSONNEL<br />

PSYCHOLOGY, 43.367-378.<br />

System Research and Applications Corporation. (1990). Army Manpower Cost<br />

System Active Component Life Cycle Cost Estimation Model Information Book.<br />

Arlington, Va.<br />

255<br />

.


The Practical Impact of Selecting TOW Gunners<br />

with a Psychomotor Test<br />

Amy Schwartz and Jay Silva<br />

The U.S. Army Research Institute for the<br />

Behavioral and Social Sciences<br />

The ongoing reduction in defense forces has focused the<br />

interest of Army management on how to maintain current deterrent<br />

and combat power with fewer soldiers. One approach is to improve<br />

the person-to-job match in entry positions. A better match may<br />

lead to lowered attrition and better performance among those 'who<br />

are selected. New selection tests, developed through the Army's<br />

Project A (Campbell, 1990), have been shown to contribute<br />

significantly to the prediction of training performance in a<br />

variety of MOS (e.g., Busciglio, 1990; Busciglio, Silva & Walker,<br />

1990). If these tests were used to classify recruits who have<br />

been selected into a family of MOS, an increase in assignment<br />

efficiency into specific MOS could result.<br />

One application of a newly developed psychomotor test is the<br />

prediction of 11H TOW (Tube-launched, Optically-tracked, &ireguided)<br />

gunner performance. Currently, recruits are accessioned<br />

into the generic MOS 11X (Infantryman) using the Combat (CO)<br />

composite of the Armed Services Vocational Aptitude Battery<br />

(ASVAB) . They are later classified into one of four Infantry MOS<br />

including 11H TOW gunners. Previous research found that<br />

psychomotor tests, especially one which required two-hand<br />

tracking of a target (Two-hand Tracking test), accounted for a<br />

significant amount of variance of simulated gunnery performance<br />

beyond that explained by the ASVAB Combat composite for TOW<br />

gunners (Silva, 1989).<br />

The present analyses examined the practical benefits of<br />

using scores on a psychomotor test to select TOW gunners. First,<br />

the potential performance gains that can be accomplished with the<br />

additional test were examined. However, performance gains for<br />

11H's may result in decreases in the quality of recruits in the<br />

remaining Infantry MOS. Determining the overall effect of<br />

implementing the new test ideally would require criteria<br />

performance data for all Infantry MOS. Since these data were not<br />

available, the impact of the additional test was examined by<br />

comparing general quality of recruits selected into the 11H MOS<br />

with that of the remaining recruits who would be assigned to the<br />

other MOS in the 11 series. Armed Forces Qualifications Test<br />

(AFQT) scores, which are currently an accepted measure of<br />

quality, were used for this comparison. Thus, the purpose of<br />

this research is to demonstrate the contribution of Two-hand<br />

Tracking to predicting TOW gunner performance, while considering<br />

the general impact of implementing the new tests for<br />

classification purposes.<br />

256<br />

.


Sample<br />

METHOD<br />

The sample consisted of 911 recruits initially selected as<br />

11X Infantrymen based on a minimum CO composite score of 85 who<br />

were then classified as 11H TOW gunners. For the present<br />

purposes, the 11Hls were assumed to have been randomly chosen<br />

from 11X's and therefore contain the same properties as the 11X<br />

population. In order to test this assumption, t-tests were<br />

conducted comparing the AFQT and CO mean scores of the current<br />

sample (AFQT M=56.66, co $$=109.71) with those of a sample of'<br />

17,000 11X's (AFQT &=57.82, Co IJ=110.22) and there were no<br />

significant differences in the means. Because of this<br />

comparability, the current sample of 11H's were considered to be<br />

representative of the total 11X population.<br />

Procedure<br />

Recruits were given the Two-hand Tracking test along with<br />

other psychomotor measures during in-processing at the Reception<br />

Battalion. Classification of the examinees into specific MOS was<br />

not based on Two-hand Tracking scores. The procedure for<br />

assignment appears to be based on demand from each of four<br />

possible assignment MOS. During TOW gunnery training, gunnery<br />

data were collected using high-fidelity gunnery simulators.<br />

Measures<br />

Two-hand Tracking. This test measures two-hand coordination on a<br />

scale of distance from target accuracy. This score has been<br />

standardized (T distribution) and inverted such that a higher<br />

score indicates better performance than a lower score.<br />

Combat. This composite is the sum of four standardized ASVAB<br />

subtests: Arithmetic Reasoning (AR), Auto and Shop Information<br />

(ASI t Coding Speed (CS) and Mechanical Comprehension (MC). A<br />

score of at least 85 is needed to qualify for the 11X MOS.<br />

Combined Score. This is the optimally weighted predicted<br />

composite of both Two-hand Tracking and Combat.<br />

Training course performance. The criterion scores indicate<br />

performance on a TOW anti-tank gunnery simulator which requires<br />

the gunner to track a moving target (i.e., a target mounted on a<br />

moving vehicle) through an infrared optical device. The two<br />

measures of interest in the present study include Event 3, the<br />

trainee's score on the first qualifying set (an index of time on<br />

target) and Pass 3, whether the trainee passed or failed on the<br />

first qualifying set.<br />

257


RESULTS AND DISCUSSION<br />

Table 1 shows the correlations among the predictors and<br />

criteria. The joint effect of using both CO and Two-hand<br />

Tracking scores has been included under the heading 'combined.'<br />

The correlations of 'combined' with Combat and Two-hand Tracking<br />

are provided as an indication of the weights of each predictor in<br />

the optimal linear combination.<br />

To evaluate the practical significance of this method it<br />

must first be demonstrated that the proposed predictors will'<br />

improve the prediction of training performance. This is<br />

supported by the multiple regression results (see Table 2). The<br />

predictors significantly explain variance in performance on Event<br />

3 when they are used alone [CO F(1,909)=49,.84, ~


Table 1<br />

Correlation Matrix of Predictors and Criteria<br />

Pass 3 . 76**<br />

Combat . 23**<br />

Two-hand .31**<br />

Criteria<br />

Predictors<br />

Event 3 Pass 3 Combat Two-hand<br />

%<br />

. 12**<br />

. 23** . 34**<br />

Combined .33** . 23** . 68** . 92**<br />

Note. **pc.OOOl. llCombinedlt represents the correlation between<br />

the predictor and the predicted values based on a linear<br />

combination of both predictors.<br />

A second practical concern is the quality of the remaining<br />

recruits to be assigned to the MOS in the 11X series. If all of<br />

the recruits who score high on the additional tests are placed<br />

into one MOS, the remaining MOS will receive less qualified<br />

individuals. It has already been demonstrated that Two-hand<br />

Tracking is as good (if not better) a predictor of TOW gunner<br />

performance as CO. It remains to be shown that selection based<br />

on Two-hand Tracking will lead to less of a decrease in the<br />

quality of the remaining recruits than CO.<br />

Table 2<br />

Predicting Performance on Event 3 Using CO and Two-Hand Tracking<br />

Model R square df F<br />

co . 052 l/909 49.84**<br />

2-Hand<br />

Tracking<br />

co &<br />

a-Hand<br />

Tracking<br />

. 094 l/909 94.06**<br />

. 111 2/908 56.69**<br />

Note -0 **p


_____ I- ..-.- --_ - ..-.. * .._. - -._._. ._.. .._.. --- ___- --...-. ._~. _ . ..____ __..... ~_<br />

Table 3<br />

Mean Performance at Cutoffs for AFQT, Actual and Predicted Scores<br />

on Event 3 and Passing Rate at Event 3<br />

Predictor SR AFQT Actual Pred. Pass/fail<br />

Score at Score at Event 3<br />

Event 3 Event 3<br />

Combat 20% 78.14 646.95 650.96 .879<br />

Two-hand 20%<br />

Tracking<br />

50%<br />

50% 68.40 633.38 633.52 .850<br />

80% 61.38 620.44 619.49 -834<br />

80%<br />

Combat & 20%<br />

Two-Hand<br />

Tracking 50%<br />

80%<br />

63.75 650.66 659.04 .912<br />

60.79 640.39 640.35 .890<br />

59.51 627.41 " 624.20 -853<br />

71.10 654.57 664.02 ,907'<br />

64.97 643.38 643.15 .881<br />

60.17 625.28 625.40 .848<br />

No Selection 56.66 609.94 .814<br />

Raising the cutoff on CO would lead to higher mean AFQT scores<br />

for llH, especially at lower SRs. This would lead to a depletion<br />

of high ASVAB quality recruits for the other Infantry MOS.<br />

Selection based on Two-hand Tracking scores also increases the mean<br />

AFQT score for llK, but to a much lesser extent than either CO or<br />

the two predictors combined. For example, a 50% SR on Two-hand<br />

Tracking would produce a higher pass rate on Event 3 than a 20% SR<br />

using CO, yet it would lead to much less of an increase in mean<br />

AFQT scores. Therefore, Two-hand Tracking, compared to CO, is<br />

better able to minimize AFQT impact while improving outcomes for<br />

11H's.<br />

While the current research demonstrates a smaller depletion ot‘<br />

AFQT scores in remaining MOS when Two-hand Tracking is used as a<br />

classification test, future research must be conducted to evaluate<br />

the potential impact of this system on training or on-the-job<br />

260<br />

I<br />

/


performance criteria. Classification is most efficient when<br />

different skills are required for the jobs being filled. If twohand<br />

tracking is equally important for all Infantry positions,<br />

selecting TOW gunners with this test may not be appropriate and<br />

will result in a depletion of necessary tracking skills of recruits<br />

from the 11 MOS. follow-up work can examine this by collecting<br />

performance data from several Infantry MOS and examining the<br />

results of using a battery of tests to determine assignment.<br />

These results assume that training performance reflects onthe-job<br />

performance,. Some initial results in the field indicate<br />

that this is true, In addition, there is some limitation in the<br />

effectiveness of Two-hand Tracking as a predictor in this context,<br />

since the sample was already preselected based on a CO cutoff of<br />

85. More gains would most likely be found if psychomotor tests are<br />

given before recruits are assigned to even a family of MOS.<br />

However, the present results suggest that with only a slight<br />

modification of the present system, the addition of a psychomotor<br />

test can lead to improved selection without greatly depleting the<br />

quality of the recruits remaining for assignment in the remaining<br />

MOS.<br />

References<br />

Busciglio, H. H. (1990). The Incremental Validity of Snatial and<br />

Perceptual-Psychomotor Tests Relative to the Armed Services<br />

Vocational Attitude Battery. (AR1 Technical Report 883).<br />

Alexandria, VA: U.S. Army Research Institute.<br />

Busciglio, H. H., Silva, J., and Walker, C. (1990). The Potential<br />

of New Army Tests to Improve Job Performance. Paper<br />

presented at the 1990 Army Science Conference.<br />

Campbell, J. P. (1990). An overview of the Army selection and<br />

classification project (Project A). Personnel Psvcholoqy,<br />

43, 231-239.<br />

Silva, J. M. (1989). Usefulness of Spatial and Psychomotor <strong>Testing</strong><br />

for Predictins TOW and UCOFT Gunnerv Performance. (AR1<br />

Working Paper WP-RS-89-21). Alexandria, VA: U.S. Army<br />

Research Institute.<br />

261<br />

..


Backaround<br />

VALIDATION OF A NAVAL OFFICER SELECTION BOARD<br />

Captain J.P. Bradley<br />

Canadian Forces Personnel Applied Research Unit<br />

Willowdale, Ontario, Canada M2N 6B7<br />

Introduction<br />

In 1976, the Canadian Navy established the Naval Off,icer __<br />

Interview Board (NOIB) for the purpose of selecting applicants for<br />

the Maritime Surface and Sub-surface (MARS), and Maritime Engineer<br />

(MARE) officer occupations. There were two components to the NOIB,<br />

a selection interview,‘ conducted by a panel of senior naval<br />

officers, and an orientation program, consisting of tours of naval<br />

facilities and briefings by naval officers. The purpose of the<br />

orientation component was to ensure that candidates would be able<br />

to make an informed decision to join the Navy if selected by the<br />

NOIB.<br />

By 1983, the NOIB had not reduced attrition among MARS and<br />

MARE trainees; therefore, the Naval Officer Selection Board (NOSB)<br />

was developed. The NOSB retained the orientation component and<br />

interview of the former NOIB but incorporated other assessment<br />

instruments to achieve a multi-method approach to the assessment<br />

of naval officer potential. In 1989, the NOSB was renamed the<br />

Naval Officer Assessment Board (NOAB).<br />

To become qualified MARS officers, candidates must complete<br />

four phases of training; the Basic Officer Training Course (BOTC),<br />

required of all Canadian Forces (CF) officer applicants regardless<br />

of military occupation, and three phases of MARS occupation<br />

qualification training. An evaluation of the NOAB's ability to<br />

predict success on BOTC by Okros, Johnston, and Rodgers (1988)<br />

demonstrated that: (a) the NOAB predicted BOTC performance better<br />

than CF recruiting centre (CFRC) measures; (b) the optimal<br />

combination of predictors produced a multiple correlation of .40;<br />

and (c) the file review was identified as the best single NOAB<br />

predictor of BOTC with a correlation of .31. The present study<br />

complements the BOTC validation study and examines the ability of<br />

the NOAB to predict MARS occupation training success.<br />

Subjects<br />

Method<br />

Of the 743 MARS candidates who have attended the NOAB, the 95<br />

who have gone on to complete all phases of MARS training comprised<br />

the sample for this validation study. The subjects in this study<br />

were male. Female applicants have attended the NOAB since 1988,<br />

but none have completed MARS occupational training to date.<br />

262


Variables<br />

Criteria. Two measures of success on MARS occupation training<br />

were used: (a) grades on the third phase (MARS III); and (b) grades<br />

on the fourth. phase (MARS IV) of MARS training.<br />

Predictors. Operational predictors used by the NOAB to assess<br />

MARS,candidates included: (a) an interview; (b) a file review (an<br />

evaluation of the biographical data collected by the CFRCs); (c)<br />

a conducting officer's assessment; (d) performance in a practical<br />

leadership exercise; (e) performance in a leaderless group<br />

discussion; and (f) a NOAB merit score (a weighted combination of ..<br />

NOAB measures). Experimental predictors included: (a) the Problem<br />

Sensitivity Test (PST); and (b) the Passage Planning Test (PPT).<br />

CFRC predictors included.: (a) a military potential score provided<br />

by CFRC staff; and (b) a measure of tested learning ability based<br />

on the CF General Classification (GC) Test. The relations between<br />

BOTC performance and MARS training success were also evaluated.<br />

Predictina MARS III Performance<br />

Results<br />

Although Table 1 shows statistically significant correlations<br />

between MARS III results and three NOAB predictors -- file review,<br />

leadership stands, and the NOAB merit score -- multiple regression<br />

analyses revealed that the leadership stands did not provide any<br />

incremental prediction beyond that contributed by the file review<br />

(R = .20). In essence, the prediction afforded by the merit score<br />

is that provided by the file review. MARS III performance was<br />

unrelated to the following measures: (a) the interview; (b) the<br />

conducting officer's assessment; (c) the leaderless group<br />

discussion; (d) the CFRC military potential score; (e) tested<br />

learning ability; and (f) performance on BOTC.<br />

Predictinu MARS IV Performance<br />

As shown in Table 1, performance on MARS IV was related to the<br />

file review, NOAB merit score, BOTC performance, and MARS III<br />

performance. Of all the predictors, the file review accounted for<br />

the most variance in MARS IV performance. The NOAB merit score<br />

also correlated with MARS IV performance; however, the predictive<br />

contribution of the merit score was actually that provided,by the<br />

file review. Multiple regression analyses also showed that neither<br />

BOTC nor MARS III performance could account for variance of MARS<br />

IV beyond that already predicted by the file review (R = .28). The<br />

following variables were unrelated to MARS IV performance: (a) the<br />

interview; (b) the conducting officer's assessment; (c) the<br />

leaderless group discussion; (d) leadership stands; (e) military<br />

potential; and (f) tested learning ability.<br />

263


Table 1<br />

Correlation Matrix of Potential Predictors and Training Criteria<br />

1. co<br />

2. -1NT<br />

3. FR<br />

4. LS<br />

2: MS LGD<br />

7. GC<br />

a. MP<br />

9. BOTC<br />

10. M-3<br />

11. M-4<br />

12. PPT<br />

13. PST<br />

1 2 3 4 5 6 7 a 9 10 1112 13<br />

25<br />

ha<br />

.44<br />

.27<br />

. 63<br />

. la<br />

:27 62.21<br />

:74 31.25 75<br />

:36 3-c .55 .62<br />

. 08 .09 . 14 .25<br />

. 35 .64 .18 .17 .50<br />

.24 ,28 .22<br />

.20 -xv-- 20<br />

. 28 - - :20<br />

-----<br />

-----<br />

.22<br />

T-27<br />

.21 I-3-6<br />

--<br />

Note. Only correlations significant to the .05 level are reported<br />

in this table. Correlations between NOAB operational predictors<br />

are based on the population of NOAB candidates (n=743).<br />

Correlations between the NOAB predictors and training criteria are<br />

uncorrected correlations based on the sample of NOAB candidates<br />

attending MARS training (n=95). CO = conducting officer, INT =<br />

interview, FR = file review, LS = leadership stands, LGD =<br />

leaderless group discussion, MS = merit score, GC = general<br />

classification test, MP = military potential, BOTC = BOTC grade,<br />

M-3 = MARS III training results, M-4 = MARS IV training results,<br />

PPT = passage planning test, PST = problem sensitivity test.<br />

Experimental Predictors<br />

Because the PST and PPT have been incorporated into the NOAB<br />

only recently, there is not yet a sufficient number of candidates<br />

who have completed the two experimental tests at the NOAB and then<br />

gone on to complete MARS occupation training to evaluate the<br />

predictive validity of these tests. In the interim, the concurrent<br />

validity of the tests was evaluated by administering them to a<br />

small group of MARS candidates (n = 43 to 122) already in the<br />

training system. The results of this preliminary research indicate<br />

that the PPT is related to both MARS III (r = .21) and MARS IV<br />

performance (r = .30); however, the PST is not related to either<br />

MARS III or MARS IVtraining success. As shown in Table 1, the PPT<br />

is unrelated to the file review, suggesting the potential for<br />

contributing incremental criterion prediction beyond that provided<br />

by the file review,<br />

264


Psychometric Properties of NOAE! Predictors<br />

As a result of the inability of some NOAB predictors to<br />

provide criterion prediction, an evaluation of the psychometric<br />

properties of the NOAB exercises was conducted using two<br />

approaches.<br />

Factor analvtic approach. The 30 dimensions measured by the<br />

five NOAB exercises were submitted to a principal components<br />

analysis (varimax rotation) which produced a seven-factor solution<br />

shown in Table 2. As illustrated in Table 2, conceptually<br />

independent dimensions underlying the first four NOAB exercises .'<br />

loaded on exercise factors rather than on factors with conceptually<br />

similar dimensions. It appears that these four exercises are<br />

producing global measures of overall performance on each exercise<br />

and not measuring exercise dimensions, thereby raising doubt about<br />

the construct validity of the dimension ratings that comprise each<br />

of the exercises. Candidates' scores on these exercises may be<br />

more attributable to the procedures followed by the NOAB than the<br />

candidates' abilities with respect to the dimensions the four<br />

exercises are supposed to be measuring. .Table 2 shows that the<br />

file review was the only NOAB exercise that appeared as a<br />

multidimensional construct (it measures three different constructs<br />

-- personal background, military experience, and intelligence).<br />

In addition, the file review was the only NOAB measure that<br />

predicted MAPS training performance. The fact that the dimensions<br />

underlying the file review loaded on factors with other<br />

conceptually similar dimensions and did not simply load on a file<br />

review factor provides evidence of construct validity for the<br />

dimension ratings comprising the file review score and may account<br />

for the file review's success as an NOAB predictor.<br />

Multitrait-multimethod matrix annroach. To investigate<br />

further the notion that the interview, leadership stands,<br />

conducting officer's assessment and leaderless group discussions<br />

are actually producing one global measure for each exercise without<br />

regard to the dimensions contained in the exercise, the<br />

correlations of conceptually similar across-exercise dimensions<br />

(similar dimensions measured by different selection exercises) were<br />

evaluated using the method described by Campbell and Fiske (1959).<br />

As reported in Bradley (19901, the correlations between<br />

conceptually similar across-exercise correlations were lower than<br />

correlations between conceptually independent within-exercise<br />

dimensions, thereby lending further support to the notion that<br />

method variance is contaminating the measurement of NOAB exercise<br />

dimensions.<br />

Discussion<br />

The results of this validation research can be summarized as<br />

follows: (a) the file review is the only NOAB measure that predicts<br />

both training criteria; (b) the leadership stand assessment<br />

265


Table 2<br />

Factor Structure of NOAB Dimensional Measures<br />

Dimension/measure I II III IV v VI VII<br />

Interview:<br />

self-confidence<br />

.83<br />

presence/bearing<br />

78<br />

verbal expression<br />

:69<br />

enthusiasm<br />

82<br />

desire for MARS<br />

:79<br />

suitability for naval role .81<br />

ability to become naval officer .80<br />

Leadership task:<br />

initiative/decisiveness<br />

seeking/accepting advice<br />

preparation and planning<br />

communicating effectively<br />

directing others<br />

creating team performance<br />

Leaderless arouo discussion:<br />

persuasiveness/forcefulness _<br />

self-confidence/bearing<br />

communication skills<br />

leadership/maintaining the aim 1<br />

alertness -<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

.88<br />

.50<br />

.81<br />

.85<br />

.86<br />

.72<br />

Conducting officer's assessment:<br />

supporting/cooperating with others -<br />

effectiveness of leadership behaviour _<br />

individual effort and drive - -<br />

desire for MARS<br />

suitability for naval environment- 1<br />

File review:<br />

family background -<br />

military/pars-military experience- _<br />

military potential -<br />

employment history -<br />

educational achievement -<br />

tested learning ability -<br />

other activities/interests - -<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

- --<br />

-<br />

- -<br />

- -<br />

.84<br />

.80<br />

.80<br />

.79<br />

.67<br />

-<br />

-<br />

-<br />

-<br />

-<br />

- ------<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

.77 -<br />

.75 -<br />

.81 -<br />

.74 -<br />

.80 -<br />

Note. Only factor loadings greater than -44 are included in this<br />

table.<br />

. _ ~ .<br />

266<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

.44<br />

-<br />

.76<br />

.68<br />

.z5<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

.a;3<br />

.44<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

-<br />

.83<br />

c<br />

I<br />

I


predicts MARS III training success, but does not improve the<br />

prediction of MARS III beyond that already provided by the file<br />

review, and the leadership stand assessment does not predict MARS<br />

IV training success; (c) the NOAB merit score predicts MARS III and<br />

MARS IV performance, but all of this criteria prediction actually<br />

originates with the file review; (d) neither the interview,<br />

conducting officer, leaderless group discussion, nor the two CFRC<br />

measures -- tested learning ability and military potential -provide<br />

any prediction of MARS III or MARS IV training success; (e)<br />

the file review is the only NOAB measure that appears to be<br />

psychometrically sound; (f) the other four operational measures<br />

(interview, leadership stands, conducting officer, and leaderless _<br />

group discussion) require a psychometric overhaul (or replacement);<br />

and (g) of the two experimental NOAB measures, the PPT has the most<br />

potential for use as an operational NOAB predictor.<br />

Based on this study and the earlier BOTC validation by Okros<br />

et al. (1988), it has been recommended that the NOAB be retained<br />

as the assessment method for selecting MARS candidates and that<br />

efforts be made to improve the board's predictive efficacy by: (a)<br />

increasing the construct validity of exercise dimensions; (b)<br />

investigating the potential for applying situational interview<br />

methods and patterned behavioural interview techniques; (c)<br />

improving the predictive efficacy of the leadership stands; (d)<br />

evaluating the predictive efficacy of the General Classification<br />

(GC) test; and (e) developing new selection measures to replace<br />

the leaderless group discussion and conducting officers'<br />

assessments.<br />

References<br />

Campbell, D.T., & Fiske, D.W. (1959). Convergent and discriminant<br />

validation bY the multitrait-multimethod matrix.<br />

Psvcholoaical Bulletin, x, 81-105.<br />

Bradley, J.P. (1990). A validation studv on the Naval Officer<br />

Assessment Board's abilitv to predict MARS Officer training<br />

success. (Working Paper 90-7). Willowdale, Ontario: Canadian<br />

Forces Personnel Applied Research Unit.<br />

Okros A.C., Johnston V.W., and Rodgers, M.N. (1988). An evaluation<br />

of the effectiveness of the Naval Officer Selection Board as<br />

a Dredictor of success on the Basic Officer Traininu Course<br />

(Working Paper 88-l). Willowdale, Ontario: Canadian Forces<br />

Personnel Applied Research Unit.<br />

267


A Situational Judgment Test of Supervisory Knowledge in the U.S. Army1<br />

Mary Ann Hanson<br />

Personnel Decisions Research Institutes, Inc.<br />

Walter C. Borman<br />

The University of South Florida and Personnel Decisions Research Institutes, Inc.<br />

A situational judgment test involves presenting respondents with realistic job situations, usually<br />

described in writing, and asking them to respond in a multiple choice format regarding what should be<br />

done in each situation. Situational judgment tests have been developed by other researchers to predict<br />

job performance, especially for management and supervisory positions (e.g., Motowidlo, Carter, & Dunnette,<br />

1989; Tenopyr, 1969).<br />

This paper describes the development, field test, and preliminary construct validation of a situational<br />

judgment test designed to measure supervisory skill for non-commissioned officers (NCOs) in the U.S.<br />

Army. In contrast with most previous research the Situational Judgment Test (SJT) is a criterion measure<br />

of job performance. It is targeted at first line supervisors (ranking E-5), and is intended to evaluate<br />

the effectiveness of their judgments about what to do in difficult supervisory situations. Thus, the SJT is<br />

somewhat like a job knowledge test in the supervisory part of the job. Although no research is available<br />

on the use of situational judgment tests as criterion measures, there is research available on the usefulness<br />

of written simulations - which are similar to situational judgment tests - as measures of professional<br />

knowledge in fields such as law and medicine. Researchers have found that scores on written simulations<br />

differentiate between groups with differing levels of experience or training and are often related to other<br />

measures of professional knowledge or performance (see Smith, 1983 for a review).<br />

Development of the SJT<br />

Method<br />

Development of the SJT involved asking groups of soldiers similar to the target NCOs (i.e., E-4s and<br />

E-5s) to describe a large number of difficult but realistic situations that Army first-line supervisors face<br />

on their jobs. Once a large number of these situations had been generated, a wide variety of possible<br />

actions (Le., response alternatives) for each situation were gathered, and ratings of the effectiveness of<br />

each of these actions were collected from both experts (senior NCOs) and the target group (E-5 NCOs in<br />

beginning supervisory positions). These effectiveness ratings were used to select situations and response<br />

alternatives to be included in the SJT. The effectiveness ratings from the senior NCOs (i.e., experts) were<br />

also the basis for the development of SJT scoring procedures. Each of these steps is described in more<br />

detail below.<br />

Participants in the workshops to develop situations and response alternatives were 52 NCOs from<br />

nine different Army posts. Some were NCOs from the target sample and some supervised target NCOs<br />

(ranks ranging from E-5 to E-6). A variation of the critical incident technique (Flanagan, 1954) was used<br />

to collect situations to be used as the item stems. Workshop participants were asked to write descriptions<br />

of difficult supervisory situations that they or their peers had experienced as first-line supervisors in the<br />

Army. This resulted in a pool of about 300 situations. Response alternatives were primarily generated by<br />

presenting participants in later workshops with the situations that had been collected and asking them to<br />

write, in two or three sentences, what they would do to respond effectively in that situation. This resulted<br />

in about 15 possible responses for each situation, These responses were content analyzed and grouped to<br />

reduce redundancies. The final result was four to ten response alternatives for each situation, with a<br />

mean of about six response alternatives.<br />

1 This research was funded by the U.S. Army Research Institute for the Behavioral and Social Sciences,<br />

Contract No. MDA903-82-C-053 1. All statements expressed in this paper are those of the authors and do not<br />

necessarily reflect the official opinions or policies of the U.S. Army Research Institute or those of the Department<br />

of the Army.<br />

268<br />

._<br />

---r<br />

I


One-hundred and eighty of the most promising situations were then chosen based on their content<br />

(e.g., appropriately difficult, realistic, etc.) and the number of plausible response alternatives available.<br />

For each of these 180 situations retained, information concerning the effectiveness of the various response<br />

alternatives was collected from two groups, a group of expert NCOs and a group of the target<br />

population NC0 job incumbents. The expert NCOs were 90 students and instructors at the United States<br />

Army Sergeants Major Academy. These NCOs were among the highest ranking enlisted soldiers in the<br />

Any (rank of E-8 to E-9), and all had extensive experience as supervisors in the Army. The target<br />

NCOs were 344 second tour soldiers (rank of E-4 to E-5) who were participating in a field test of a group<br />

of job performance measures at several Army posts in the United States and Europe. For each SJT situation,<br />

these respondents were asked to rate the effectiveness of each response alternative on a seven point<br />

scale (1 = least and 7 = most effective). Because there were still 180 situations and time limitations, each<br />

soldier could only respond to a subset of the situations. This resulted in about 25 expert NC0 and 45<br />

incumbent NC0 responses per situation.<br />

Items (situations) for the field test version of the SIT and response alternatives for these items were<br />

then selected based on these data, The following criteria were used to select 35 of these situations and<br />

from 3-5 response alternatives for each situation: 1) the expert group had high agreement concerning the<br />

most effective response for the item; 2) the item was difficult for the incumbents (i.e., agreement was<br />

substantially lower than for the expert group); 3) the difference between the expert and the incumbent<br />

responses for each situation was judged to reflect an important aspect of supervisory knowledge; and 4)<br />

the content of the final group of situations was as representative as possible of the first-line supervisory<br />

job in the Army.<br />

Field Test of the SJT<br />

The field test of the SJT had three major objectives. The first objective was to explore different<br />

methods of scoring the SJT. The second objective was to examine and evaluate the psychometric properties<br />

of this instrument. The final objective was to obtain preliminary information concerning the consttuct<br />

validity of the SJT as a criterion measure of supervisory job knowledge.<br />

The SJT was administered as part of a larger data collection effort to a sample of 1049 NCOs (most<br />

were E-4s and E-5s) at a variety of posts in the United States and Europe. For each of the 35 SJT items,<br />

these soldiers were asked to place an “M” next to the response alternative they thought was the most<br />

effective and an “L” next to the response alternative they thought was the least effective.<br />

Scoring Procedures. Several different procedures for scoring the SJT were explored. The most<br />

straightforward was a simple number correct score. For each item, the response alternative that had been<br />

given the highest mean effectiveness rating by the experts (senior NCOs) was designated the “correct”<br />

answer. Respondents were then scored based on the number of items for which they indicated that this<br />

“correct” response alternative was the most effective. The second scoring procedure involved weighting<br />

each response alternative chosen by soldiers as the most effective by the mean effectiveness rating given<br />

to that response alternative by the expert group. This gives respondents more credit for choosing<br />

“wrong” answers that are relatively effective than for choosing wrong answers that are very ineffective.<br />

These item level effectiveness scores were then averaged to obtain an overall effectiveness score for each<br />

soldier, Averaging these item level scores instead of simply summing them placed respondents’ scores<br />

on the same 1 to 7 effectiveness scale as the experts’ ratings and ensured that respondents were not penalized<br />

for any missing data (up to 10% missing responses were allowed).<br />

Scoring procedures based on respondents’ choices for the least effective response to each situation<br />

were also explored. The ability to identify the least effective response alternatives might be seen as an<br />

indication of respondents’ ability to avoid these very ineffective responses or in effect to avoid “screwing<br />

up”, As with the choices for the most effective response, a simple number correct score was computed:<br />

the number of times each respondent correctly identified the response alternative that the experts rated<br />

the least effective. In order to differentiate this score from the number correct score based on choices for<br />

the most effective response, this score will be referred to as the L-Correct score, and the score based on<br />

choices for the most effective response (described previously) wit1 be referred to as the M-Correct score.<br />

Another score was computed by weighting respondents’ choices for the least effective response altema-<br />

269


tive by the mean effectiveness rating for that response, and then averaging these item level scores to<br />

obtain an overall effectiveness score based on choices for the least effective response alternative. This<br />

score will be referred to as L-Effectiveness, and the parallel score based on choices for the most effective<br />

responses (described previously) will be referred to as M-Effectiveness.<br />

Finally, a scoring procedure that involved combining the choices for the most and the least effective<br />

response alternative into one overall score was also explored. For each item, the mean effectiveness of<br />

the response alternative each soldier chose as the least effective was subtracted from the mean effectiveness<br />

of the response alternative they chose as the most effective. Because it is actually better if<br />

respondents indicate that less effective response alternatives are the least effective, this score can be seen<br />

as a sum or composite of the two effectiveness scores described previously (i.e., subtracting a negative<br />

number from a positive number is the same as adding the absolute values of the two numbers). These<br />

item level scores were then averaged together for each soldier to generate yet another score, and this<br />

score will be referred to as M-L Effectiveness.<br />

Descriotive Statistics. Descriptive statistics and internal consistency reliability estimates (RR-20)<br />

were computed for each of the five scoring procedures. Intercorrelations were also computed among the<br />

five scores generated by the five different scoring procedures.<br />

Preliminarv Information Concerning Construct Validity<br />

The data from this field test were also used to obtain preliminary information concerning the construct<br />

validity of the SJT as a criterion measure supervisory job knowledge. As mentioned previously, collecting<br />

the field test data for the SJT was a part of a larger data collection effort. Several other job performance<br />

measures were administered concurrently with the SJT, including job knowledge tests, a self-report<br />

administrative information survey, and supervisory simulation exercises (involving training a subordinate,<br />

disciplinary counseling, and personal counseling). Performance ratings were also collected from<br />

peers and supervisors using behavior-based rating scales. If the SJT is a valid measure of supervisory job<br />

knowledge, certain relationships would be expected with these other measures. For example, it should<br />

have at least moderate correlations with the scores on the supervisory simulations and performance ratings<br />

on supervisory dimensions. Correlations of SJT scores with several of these other job performance<br />

measures were examined.<br />

Another type of information that was used to assess the construct validity of the SJT was the extent<br />

to which the knowledges assessed by the SJT am learned on the job. If the SJT is a valid measure of job<br />

knowledge, soldiers who have more experience or training would be expected, on average, to obtain<br />

higher scores than soldiers with less experience or training. Self report information was collected from<br />

the soldiers in this field test sample concerning whether or not they had attended any supervisory training<br />

and how regularly they were required to supervise other soldiers, Mean SJT scores for soldiers with<br />

different levels of training and experience were also examined.<br />

Field Test Results<br />

Results<br />

Table 1 presents the mean score for each of the five scoring procedures. The maximum possible for<br />

the M-Correct scoring procedure is 35 (i.e., all 35 items answered correctly), but the mean score obtained<br />

by soldiers in this sample was only 16.25. The maximum score obtained was only 27. The mean number<br />

of least effective response alternatives correctly identified by this group was only 14.86. Clearly the SJT<br />

was difficult for this group of soldiers.<br />

Table 1 also presents the standard deviation for each of the five scoring procedures, and all of the<br />

scoring procedures resulted in a reasonable amount of variability in scores obtained by the soldiers in this<br />

sample. Table 1 also shows that the internal consistency reliabilities for all of these scoring procedures<br />

are quite high. The most reliable score is M-L Effectiveness, probably because this score contains more<br />

information than the other scores (i.e., choices for both the most and least effective response).<br />

.~ .,<br />

f---~..-wz.-.-.. _ .,._ _ 2 7 0


Table 2 presents the intercorrelations among scores obtained using the five different scoring ptocedures.<br />

These intercorrelations range from moderate to very high. Correlations between scores that are<br />

based on the same set of responses (e.g., M-Correct with M-Effectiveness) are higher than cortelations<br />

between scores that are based on different sets of responses (e.g., M-Correct with L-Correct). The correlations<br />

between L-Effectiveness and the other scores are negative, because lower L-Effectiveness scores<br />

are actua.lly better. The high (negative) correlation between M-Effectiveness and L-Effectiveness seems<br />

to indicate that these two scores measure similar or related constructs.<br />

Table 1<br />

Situational Judgment Test Means, Standard Deviations, and Internal Consistencies<br />

Scoring Procedure N Mean SD<br />

Internal<br />

Consistency<br />

Reliability’<br />

M-Correct 10253 5. 16.52 4.29 50<br />

M-Effectiveness lo253 4.91 .34 .68<br />

L-Correct 10073 14.86 3.86 .57<br />

L-Effectiveness loo73 3.542 .31 .68<br />

M-L Effectiveness loo73 1.36 .61 .75<br />

’ KR-20 ’<br />

2 Low scona indicate higher performance.<br />

3 Soldiers with mom than 10% incomplete or invalid data were anittad from these a&‘ses.<br />

Table 2<br />

Situational Judgment Test Score Intercorrelations for the Five Scoring Procedures<br />

M-Eff. L-Correct L-Eff. M - L Eff.<br />

M-Correct .94 52 44 .86<br />

M-Eff. -me .59 -.70 .93<br />

L-Correct m e- -*- -.86 .78<br />

L-Eff. m-e v-m --_ -.92<br />

M - L Eff. -w- we- --- w-m<br />

Note. Sample sizes range from 1007 to 1025.<br />

271<br />

.._ II<br />

..----... ._.___._ I-<br />

.


-- .- ..-- -_*- --.._ --l_*y -r<br />

The M-Correct and L-Correct scores have less desirable psychometric properties than the scores<br />

obtained using the other three scoring procedures. In addition, these two scores contain information that<br />

is very similar to the information provided by the M-Effectiveness and L-Effectiveness scores respectively,<br />

because they are based on the same sets of responses. Thus, results reported for the remainder of the<br />

analyses will not include these two scores.<br />

Preliminarv Information Concerninp Construct Validity<br />

Table 3 shows the correlations of the three remaining SJT scores with scores from the other job<br />

performance measums. The SJT SCOIW cormlate moderately with a composite of scores on the three<br />

superviscry simulations. The SJT scores also have moderate correlations with the performance rating<br />

composite called Leading/Supervising. Correlations with the other performance rating composites are<br />

slightly lower. Correlations with scores on the job knowledge tests are quite high, but this is not surprising<br />

in view of the fact that these are also paper-and-pencil tests. Finally, the SJT scores have moderate<br />

correlations with a variable called “grade deviation score”, which is essentially promotion rate. Promotion<br />

rate might be seen as an overall measure of success as a soldier.<br />

Table 3<br />

Correlations Between SJT Scores and Other Job Perfomlance Measures<br />

Performance Rating Composites3<br />

Effort/ Grade* Supervisory<br />

Leading/ Technical Personal <strong>Military</strong> Job Deviation Simulation<br />

Supervising Performance Discipline %uing Knowledge’ Score Composite4<br />

M-Eff. .24 .21 .20 .ll .40 .20 .20<br />

L-Eff. ~18 -.17 -.15 -.06 -.34 -.20 -.I6<br />

M-L Eff. .22 .21 .18 .lO .40 .22 .20<br />

___- -....-. ___.--<br />

1 Weighted mean across nine MOS; sample size per MOS ranges from 38 to 146.<br />

2 This variable is essentially promotion rate; sample sizes range from 849 to 919.<br />

3 Based on pooled peer and supervisor ratings. Sample sizes range fran 855 to 907: a con-elation of .07 is siwifiant at the .05 level.<br />

4 Composite of scores ftwft three. simulations: personal counseling, disciplinary counseling, and training. Sample<br />

aizcs range from 873 to 909, a correlation of .07 is significant at the .OS level.<br />

Table 4 shows the mean SJT scores of soldiers who reported various levels of supervisory training.<br />

Soldiers who had attended no supervisory school at all scored almost a half a standard deviation lower<br />

than those who had attended one or mom supervisory schools. One potential confound in this compati.son<br />

is that the opportunity to attend supervisory schools varies, and decisions concerning which soldiers<br />

are given the opportunity to attend these schools may be influenced by their effectiveness as soldiers or<br />

as supervisors. As a result, it is possible that these mean SJT score differences were obtained because the<br />

more effective soldiers were given the opportunity to attend supervisory training. However, regardless of<br />

whether these differences are the result of differential opporturities or training in the relevant supervisory<br />

skills, these mean score differences provide some support for the construct validity of the SJT as a measure<br />

of supervisory skill.<br />

Mean SJT scores are also reported on Table 4 for subgroups of soldiers identified by how frequently<br />

they reported supervising other soldiers. For all three SJT scoring procedures the expected pattern was<br />

found; soldiers who reported that they supervised other soldiers more frequently obtained better SJT<br />

sCOms. The largest difference is for the L-Effectiveness score. Soldiers who reported that they regularly<br />

supervise other soldiers obtained L-Effectiveness scores almost half a standard deviation better (i.e.,<br />

272<br />

I


lower) than those of soldiers who reported that they never supervise other soldiers. These results for<br />

supervisory experience are slightly different than those obtained for supervisory training, where the<br />

largest mean differences were found for the M-Effectiveness score. Perhaps this is because supervisory<br />

experience sometimes involves making mistakes and leaming from the consequences of these mistakes<br />

(i.e., learning to identify ineffective responses), but supervisory training is more likely to focus on the<br />

identification of effective supervisory responses.<br />

Table 4<br />

Mean Situational Judgment Test Scores for Soldiers With Different Levels of<br />

Supervisory Training and Experience<br />

N M-Eff. L-Eff. M-L Eff.<br />

Attendedone or more supervisory schools 560-603 4.97 3.50 1.47<br />

-7<br />

Attendedno supervisory school 327-371 4.81 3.62 1.20<br />

How often required to supervise other soldiers:<br />

Never 87199 4.87 3.63 1.23<br />

Sometimes fill in for regular supervisor 294-327 4.86 3.58 1.29<br />

Often fili in for regular supervisor 125-135 4.90 3.53 1.38<br />

Regularly supervise other soldiers 391415 4.96 3.49 1.47<br />

Conclusions<br />

The results of the field test of the SJT indicate that this test is appropriately difficult for the target<br />

sample. The five scoring procedures that were explored all resulted in scores with a reasonable amount<br />

of variance among the soldiers in this sample. Internal consistency reliabilities were also quite high.<br />

Based on all of the psychometric properties examined, the most promising score appears to be M-L Effectiveness,<br />

which has an internal consistency reliability of .75.<br />

The preliminary information obtained concerning the construct validity of the SJT provides evidence<br />

that the SJT is a valid measure of supervisory job knowledge. The correlations of SJT scores with the<br />

other job performance measures provide some support for the construct validity of the SJT. However, the<br />

SJT also has moderate correlations with several measures of technical performance and with promotion<br />

rate. Mean SJT scores for soldiers with different levels of supervisory experience and training indicate<br />

that the knowledge or skill measured by the SJT is, to some extent, learned on the job and in supervisory<br />

training.<br />

REFERENCES<br />

Motowidlo, S. J., Dunnette, M. D., & Carter, Cl. W. (in press). An alternative selection procedure: The<br />

low fidelity simulation. Journal of Applied Psychology.<br />

Smith, I. L. (1983). Use of written simulations in credentialing programs. Professional Practice of<br />

Psychology, 4.2 l-50.<br />

Tenopyr, M. L. (1969). The comparative validity of selected leadership scales relative to success in<br />

production management. Personnel Psychology, 22,77-85.<br />

273


Context Effects on Multiple-Choice Test Performance<br />

Lawrence 5. Buck*<br />

Planning Research Corporation, System Services<br />

Introduction<br />

It has long been a tenet of test construction theory and practice that test items<br />

measuring the same content or behavioral objectives should be grou,ped within a<br />

test. For example, Tinkelman (1971) stated:<br />

If items measuring different content objectives or different behavioral<br />

objectives are included in the same test, consideration should be given to<br />

grouping the items by type. Usually the continuity of thought that such<br />

grouping allows on the part of the examinee is found to enhance the<br />

quality of his/her performance.<br />

Other rationales for grouping similar items include such viewpoints as: test anxiety<br />

may be reduced by grouping items on a test, examinees will concentrate better if<br />

they do not jump from subject to subject, and examinees might glean information<br />

from certain questions in a set of questions that will facilitate the answering of other<br />

questions in the set (Gohmann & Spector, 1989).<br />

A majority of the studies addressing item positioning have centered on the effects of<br />

ordering questions by difficulty level rather than by content. (For a representative<br />

sample, see: Hodson, 1984; Sax & Cromack, 1966; Leary & Dorans, 1985; and Plake,<br />

1980.) Numerous other studies, primarily in the educational arena, have addressed<br />

the effects of randomizing items in tests rather than presenting the items in the<br />

order that the information is covered in the classroom or in the textbook(s). (For a<br />

representative sample, see: Gohmann & Spector, 1989; Taub & Bell, 1975; and<br />

Bresnock, Graves, & White, 1989).<br />

The primary focus of this study is the effect on part and total test performance of<br />

randomizing the items on multiple-choice tests normally constructed with the items<br />

grouped by content areas or domains. A secondary objective was to evaluate the<br />

effects on the individual item statistics. The items in the tests in question are<br />

normally presented from easiest to most difficult within each domain.<br />

Two tests were selected for this study, Rigging and Weight <strong>Testing</strong> (BM-0110) and<br />

Outside Electrical (EM-4613). These tests are part of a testing program which<br />

develops, administers, and maintains Journeyman Navy Enlisted Classification (JNEC)<br />

exams for the Navy’s Intermediate Maintenance Activity (IMA) community. The tests<br />

are part of the qualification process for special classification codes. Both the BM-<br />

0110 and EM-4613 examinations consist of 120, four-choice, multiple-choice test<br />

questions spread across six domains as indicated in Table I below.<br />

Table I<br />

Test Item-Domain Breakdown<br />

Domains<br />

Test # of Items 1 2 3 4 5 6<br />

BM-0110 120 18 30 14 12 30 16<br />

EM-461 3 120 10 6 14 55 22 13<br />

*The author wishes to thank Norma Molina-laggard for her able assistance with the data analyses<br />

274


For each administration, the tests were generated with a total test and each domain<br />

mean difficulty index (p-value) of -60. The tests are essentially power tests with<br />

three hours allowed. The cutting score for each test is based on 62.5% of the<br />

number of test questions (a score of 75) or the group mean, whichever is higher. The<br />

cutting score was 75 for each of the tests for each administration. The test items<br />

were selected in accordance with the following parameters, p-values between -25<br />

and .90 and biserials between .15 and .99.<br />

The tests are administered twice yearly, in the spring and fall, to enlisted Navy<br />

personnel in pay grades E-5 through E-9, with a minimum of nine months experience<br />

in an IMA activity. BM-0110 was developed in the summer of 1987 and placed into<br />

operational use in the fall of 1987. EM-4613 was developed in the fall of 1987 and<br />

placed into operational use in the spring of 1988. All tests in this program *were<br />

developed by subject-matter experts from each trade under the tutelage of a testing<br />

specialist. All of the tests are computer generated by an automated test processing<br />

system (TPS) that includes item banking, scoring, and analysis and updating of all<br />

test and item data.<br />

Procedure<br />

Three different administrations -- Spring 1989 (l-89), Fall 1989 (2-89), and Spring<br />

1990 (l-90) --were used for this study for both the BM-0110 and EM-4613 tests. Both<br />

the l-89 and I-90 tests were constructed under normal procedures, i.e., with items<br />

grouped by domain and presented from easiest to most difficult within each<br />

domain. For the 2-89 administrations, the test items were randomized without<br />

regard for content area or difficulty level.<br />

The items for each administration were generated by the TPS from the total item<br />

pool available for each test and therefore the items were not identical across<br />

administrations. Table II presents the number of items common to each pair of test<br />

administrations.<br />

Table II<br />

Common Items Between Administrations<br />

1-89 - 2-89 1-89 - l-90 2-89 - l-90<br />

BM-0110<br />

I<br />

71 77<br />

I<br />

89<br />

EM-461 3 66 67<br />

I<br />

67<br />

Under ideal conditions, the research design would have used the same items for each<br />

administration and both forms of the test would have been administered at the<br />

same time. However, due to a number of factors including fairly small N’s and<br />

numerous repeat candidates from one test administration to another, the ideal<br />

design was not possible. The test populations do tend to be quite stable from one<br />

administration to another, however, in terms of trade experience and numbers from<br />

each paygrade.<br />

The test results and item statistics from each administration for each test were<br />

compared with the other administrations from four different perspectives -- total<br />

test results, part test scores, common item comparisons, and individual item<br />

statistics. As previously stated, the objectives were to determine if randomizing the<br />

items would have any effect on total test performance, part (domain) test<br />

performance, and individual item statistics. A variety of statistical procedures were<br />

employed to analyze the data including Z-tests, two-tailed t-tests, and ANOVAs.<br />

275


Results<br />

Total Test Performance. With respect to total test performance, the test results<br />

were quite consistent from administration to administration as reflected in Table III.<br />

The 2-89 administration seems to be a little easier for both the BM-0110 and EM-<br />

4613 tests although the differences are small. The test reliabilities also remained<br />

reasonably consistent across test administrations.<br />

Table III<br />

Summary Test Statistics<br />

BM-0110 EM-461 3<br />

A Z-test was applied to the mean test scores between paired comparisons, i.e., l-89<br />

with 2-89, etc., and all results were nonsignificant at the .OS level. In this respect, we<br />

were unable to reject the null hypothesis for all comparisons. An ANOVA was also<br />

calculated across each of the three administrations and the results were not<br />

significant at the .05 level for either the BM-0110 (F[2,359] = -1.183) or EM-4613<br />

(F[2,359] = .028).<br />

Table IV below, presents another way of comparing the overall test results as the<br />

passing rates by paygrade are presented for each administration. The passing rates<br />

are reasonably consistent across test administrations with somewhat higher<br />

percentages passing for the 2-89 test. These results are not inconsistent with the<br />

test results from other tests in the program where some fluctuations occur but the<br />

passing rates remain fairly consistent for each paygrade.<br />

Table IV<br />

Test Results by Paygrade<br />

BM-0110<br />

1-89 Passing 2-89 Passing l-90 Passing<br />

Paygrade N N Oh N N % N N Oh<br />

E-5 34 13 38 26 11 42 21 7 33<br />

E-6 9 5 56 10 6 60 12 5 42<br />

E-7 5 2 40 2 1 50 1 1 100<br />

E-8 & E-9 0 0 0<br />

TOTALS 48 20 42 38 18 47 34 13 38<br />

276


Table IV cont.<br />

EM-46 13<br />

Paygrade 1 N 1 N 1 % 1 N I N I % 1 N 1 N 1 %<br />

E-5 43 16 37 42 9 21 25 6 24<br />

E-6 44 18 41 42 22 52 31 12 39<br />

E-7 11 5 45 8 7 88 4 2 50<br />

E-8 8 E-9 1 0 0 1 1 100 2 2 100<br />

TOTALS 99 39 37 93 39 42 62 22 35<br />

Part Test Performance. In addition to evaluating any effects on total test<br />

performance of randomizing the items it was also considered prudent to consider<br />

any effects on domain performance. As indicated in Table V below, the results are<br />

similar to those reported in Table Ill for total test performance. That is, the average<br />

domain scores are quite consistent across test administrations with the 2-89<br />

administration being somewhat easier for almost all domains across the three<br />

administrations.<br />

Table V<br />

Average Domain Scores<br />

BM-0110 EM-461 3<br />

Randomized complete block design ANOVAs were computed for the domain scores<br />

across the three administrations of each test and the results were not significant for<br />

either the BM-0110 or EM-4613, (F[2,17] = 2.36) and (F[2,17] = .015) respectively.<br />

Common Item Comparisons. Since it was not possible to use the same items in total<br />

for each of the three test administrations, it was also necessary to evaluate the<br />

effect, if any, on the subset of common items for each paired comparison. A twotailed<br />

t-test was used to analyze the items common to each pair of administrations<br />

and all results for both the BM-0110 and EM-4613 were nonsignificant at the .05<br />

level, In addition, ANOVAs were calculated for each of the three administrations of<br />

the BM-0110 and EM-4613 tests and the results failed to reveal any significant<br />

differences at the .05 level of significance, (F[2,74] = .044) and (F[2,1461 = -720)<br />

respectively.<br />

Individual Item Statistics. The issue of any effect on item statistics of varying the<br />

item’s position was investigated by comparing the item difficulty indexes (p-values)<br />

of common items in each pair of test administrations as well as the item<br />

277


discrimination values (biserials). That is, does presenting the items in other than<br />

their normal domain and without regard to difficulty level, have an effect on the<br />

items’ statistics? Table VI presents the average p-value changes for the common<br />

items between the paired test administrations. The first test in each pair served as<br />

the base item position for comparative purposes. As indicated in Table VI, the<br />

avera e item p-values showed a somewhat greater tendency to increase (items<br />

easier 7than<br />

to decrease although the differences are small. The average overall<br />

change in the items’ p-values remained quite consistent across the three pairs of test<br />

administrations for both the BM-0110 and the EM-4613.<br />

.’<br />

Paired<br />

Administrations<br />

Table VI<br />

Comparison of Common Items’ P-Values<br />

Average P-Value Change By Relative Position<br />

EM461 3<br />

Average Overall Average Overall Average Overall<br />

Change Increase* Decrease*<br />

l/89 with l/90 .095 .084 .075 .107 .098<br />

2/89 with l/89 .091 .105 .079 .092 .121<br />

2/89 with l/90 .099 .095 .117 .087 .I12<br />

BMOllO<br />

l/89 with l/90<br />

2/89 with l/89<br />

.<br />

.222<br />

I<br />

.199 .055 1.199 .071<br />

1.246 .091 I.198 .068<br />

2/89 with l/90 .213 .227 .087 1.190 .082<br />

I<br />

I<br />

*The first column represents the average for the first test of each pair;<br />

the second column represents the second test.<br />

With respect to the items’ biserials, Table VII presents the average biserials for the<br />

common items in each pair of test administrations As was the case with the pvalues,<br />

the average biserials were quite consistent between paired test<br />

administrations with the differences quite small.<br />

Table VII<br />

Average Biserials for Common Items<br />

of Paired Test Administrations<br />

EM-4613 BM-0110<br />

l/89 with 2/89 .29 .25 .30 .34<br />

l/89 with l/90 I.32 -25 1.32 .28<br />

2/89 with l/90 .21<br />

I<br />

278<br />

.22<br />

.32 .26<br />

I


Discussion<br />

This study failed to show that randomizing the items in a multiple-choice test would<br />

have a deleterious effect on examinees with respect to test performance. If<br />

anything, the randomized tests were somewhat easier although the differences<br />

were small and were not significant. The effects on item statistics were minimal as<br />

the item difficulty indices (p-values) showed no clear trend of increasing or<br />

decreasing when comparing randomized vs. nonrandomized tests, and the item<br />

discrimination values (biserials) remained quite consistent across test<br />

administrations.<br />

Within the confines of this study, it was not possible to assess examinee reaction to<br />

the different test formats to discern whether the different item presentations were<br />

perceived differently by the examinees. Nor was it possible to determine whether<br />

examinees answer questions in order or tend to skip around and group like<br />

questions even thou h they are not grouped on the test. Studies by Tuck (1978) and<br />

Allison and Thomas 91986) have suggested that few examinees answer questions in<br />

order and that there is a tendency to group similar items.<br />

The study supports the stability of item statistics across different test formats and<br />

administrations and the lack of any significant contextual or item position effects on<br />

test performance. The implications of these findings are, to preclude the possibility<br />

of cheating, randomized versions of the same tests could be administered without<br />

fear of creating an unfair advantage or disadvantage.<br />

References<br />

Allison, D.E., and D.C. Thomas. 1986. Item-difficulty sequence in achievement<br />

examinations: Examinees’ preferences and test taking strategies. Psychological<br />

Review 59,867-70.<br />

Bresnock, A.E., P.E. Graves, and N. White. 1989. Multiple-choice testing: Questions<br />

and response position. Journal of Economic Education, (Summer), 239-245.<br />

Gohmann, SF., and L.C. Spector. 1989. Test scrambling and student performance,.<br />

Journal of Economic Education, (Summer), 235-238.<br />

Hodson, D. 1984. The effect of changes in item sequence on student performance<br />

in a multiple-choice chemistry test. Journal of Research in Science Teaching, Vol.<br />

21, N. 5,489-495.<br />

Leary, L.F., and N.J. Dorans. 1985. Implications for altering the context in which test<br />

items appear: A historical perspective on an immediate concern. Review of<br />

Educational Research 55, (Fall), 387-413.<br />

Plake, B.S. 1980. Item arrangement and knowledge of arrangement on test scores.<br />

Journal of Experimental Education 49, (Fall), 56-58.<br />

Sax, G., and T.A. Cromack. 1966. The effects of various forms of item arrangements<br />

on test performance. Journal of Educational Measurement 3, 309-311.<br />

Taub, A.J., and E.B. Bell. 1975. A bias in scores on multiple-form exams. Journal of<br />

Economic Education 7, (Fall), 58-59.<br />

Tinkelman, S.N. Planning the objective test. In R.L. Thorndike (Ed.), Educational<br />

Measurement (2nd. ed.). Washington, D.C.: American Council on Education,<br />

1971.<br />

Tuck, J.P. 1978. Examinee’s control of item difficulty sequence. Psychological<br />

Reports 42, 1109fO.<br />

279


ABSTRACT<br />

DIETARY EFFECTS ON TEST PERFORMANCE<br />

Charles A. Salter<br />

Laurie S. Lester<br />

Susan M. Luther<br />

Theresa A. Luisi<br />

U.S. Army Natick Research, Development & Engineering Center<br />

Natick, MA<br />

Previous research suggests that meal composition may affect performance<br />

on the automated Memory and Search Task (MAST). The purpose of this study<br />

was to determine if lunch protein or carbohydrate would interact with caffeine to<br />

affect performance and mood as assessed by the MAST, the Automated Portable<br />

Test System (APTS), and visual-analogue mood scales. Male subjects were<br />

assigned either to a protein lunch (5 g/kg turkey breast) or a carbohydrate lunch (5<br />

g/kg sorbet) group so that normal caffeine intakes were equivalent. Within each<br />

group, subjects rotated through two caffeine conditions in a counterbalanced order,<br />

drinking two cups of either caffeinated or decaffeinated coffee with lunch. Caffeine<br />

use was prohibited at other times during the study. The APTS was consistent with<br />

the MAST in showing no performance effects of protein and caffeine, though<br />

protein did correlate with some self-reported moods. The protein group reported<br />

increased hunger over time (p=.OO2) and felt less dejected (p=.O4) than did the<br />

carbohydrate group, while caffeine produced no significant effects. Greater<br />

carbohydrate intake was associated with lower MAST scores, though the direction<br />

of causation is unclear, and it had no effect on the APTS. It is concluded that<br />

performance on the MAST and APTS are relatively unaffected by dietary<br />

differences of this type and magnitude.<br />

INTRODUCTION<br />

The automated Memory and Search Task (MAST) uses a hand-held<br />

computer to present stimuli consisting of randomized sequences of 16 alphabetic<br />

characters each along with randomized targets of 2, 4, or 6 letters that the subject<br />

identifies as being present within or absent from each stimulus (Salter et al, 1988).<br />

The first studies with the MAST used it as a tool to assess dietary effects on<br />

performance (Salter et al, 1988). These studies indicated a significant post-lunch<br />

slump in MAST scores followed by recovery later in the afternoon. Salter and Rock<br />

(1989) did not find a post-meal decrement in performance on the MAST when<br />

slightly different times were used for testing. This latter study did find, however, that<br />

the more protein subjects ate at lunch, the better they scored on the MAST<br />

280


afterwards. The purpose of the current study was to determine whether such<br />

nutrients/food ingredients as protein, carbohydrate, and caffeine would affect<br />

performance not only on the MAST but also on several subtests (pattern<br />

recognition, reaction time, symbolic reasoning, and hand-tapping) of the<br />

Automated Portable Test System or APTS (Bittner et al, 1985) and visual-analogue<br />

mood scales.<br />

Previous research has demonstrated that protein can enhance performance<br />

because it contains tyrosine, the amino acid precursor to norepinephrine, which<br />

helps the body function in states of arousal or stress (Lieberman et al, 1984).. On<br />

the other hand, carbohydrate leads to insulin release which helps clear the blood<br />

of amino acids except tryptophan, resulting in greater passage of this serotonin<br />

precursor into the brain. Serotonin can induce a drowsy quiescent state capable of<br />

suppressing performance (Lieberman et al, 1982/83). Caffeine is a commonly<br />

used performance enhancer demonstrated to increase alertness and vigilance<br />

(Sawyer, Julia, and Turin, 1982). We particularly wanted to test whether caffeine<br />

would interact with either protein or carbohydrate in affecting performance.<br />

METHOD<br />

The subjects were military and civilian employees, males only, at the US<br />

Army Natick Research, Development & Engineering Center. All potential subjects<br />

were screened for previous caffeine use. Only those who normally consumed<br />

between 2 and 4 caffeinated beverages (coffee, tea, or soda) per day were<br />

retained. The subjects then filled out a questionnaire regarding their typical<br />

caffeine use, from which their total daily caffeine ingestion was estimated. The<br />

subjects were then split into a protein-lunch group (16 subjects) and a<br />

carbohydrate-lunch group (18 subjects) so that the average daily caffeine intake<br />

was equivalent in both groups.<br />

On the first day of testing, the subjects were trained in the use of the<br />

automated MAST, the APTS (using the pattern recognition, reaction time, symbolic<br />

reasoning, and hand-tapping subtests), and visual-analogue mood scales<br />

(indicating on a IOO-mm line how relatively tense, hungry, dejected, tired, angry,<br />

vigorous, and confused they felt). On the following two days of testing, all subjects<br />

were fed the same standard, mixed-nutrient breakfast at 0730 hours, tested at 1000<br />

hrs, fed the experimental lunch at 1130, given a math exercise immediately after,<br />

then tested shortly after noon and finally at 1430. The timed math exercise (30<br />

minutes maximum) was used because previous studies (Morse et al, 1989) found<br />

that it served as an effective stressor to mobilize norepinephrine use. The protein<br />

lunch group was served 5 g/kg turkey breast, while the carbohydrate lunch group<br />

was provided with 5 g/kg sorbet. These two foods were chosen because previous<br />

research had demonstrated them capable of having behavioral effects (Spring,<br />

Lieberman, Swope, and Garfield, 1986). Subjects were instructed to eat as much<br />

of their test meals as they could, but there was a wide variation in the proportion<br />

281


consumed. Within each group, subjects rotated through two caffeine conditions in<br />

a counterbalanced order, drinking two cups of either caffeinated or decaffeinated<br />

coffee with lunch. Caffeine use was prohibited at other times during the study.<br />

RESULTS AND DISCUSSION<br />

Analysis of variance tests indicated no significant differences on MAST<br />

performance as a function of group (protein vs. carbohydrate), caffeine (or its<br />

absence), or the interaction of group and caffeine. Salter and Rock (1989) similarly<br />

found no major group effects due to nutrient type, but did find significant<br />

correlations between the proportion of protein actually consumed and<br />

performance. In Table 1 can be seen the correlations in the current study between<br />

the percent of nutrient consumed and MAST performance. Whereas Salter and<br />

Rock (1989) found a positive correlation for protein, the current study found a<br />

negative correlation for carbohydrate. Previous studies have found both types of<br />

effects (Lieberman et al, 1984). However, consideration of the time factor indicates<br />

that the significant negative correlation occurred even in the morning before<br />

Table 1<br />

Correlations Between Percent of Test Food Consumed<br />

and automated Memory and Search Task (MAST) scores<br />

Time.<br />

- Task Level:<br />

II 000 hrs)<br />

2<br />

(1200 hrs)<br />

3<br />

(1430 hrs)<br />

* p-E.05<br />

** PC.01<br />

*** PC.001<br />

Z-character target<br />

4-character target<br />

6-character target<br />

2-character target<br />

4-character target<br />

6-character target<br />

2character target<br />

4-character target<br />

6-character target<br />

Protein Carbohydrate<br />

/N=l6) /NJ 81<br />

-.28 -.58*<br />

-.36 -.58*<br />

-.38 -.58*<br />

-.03 -.65**<br />

-.07 -.37<br />

-.13 -.66**<br />

-.06 -.79***<br />

-.02 -.x5*<br />

-.I5 -.47*


consuming the carbohydrate. This study, then, is a clear example of correlation not<br />

implying causation. If anything, it appears that people who score lower on the<br />

MAST are inclined to eat more carbohydrate rather than the other way around.<br />

The proportion of test meals consumed was also correlated with APTS<br />

performance, and there were no significant effects on the tasks of pattern<br />

recognition (stating whether two patterns of asterisks were the same or different),<br />

reaction time (pressing the number of the four boxes which lights up), or symbolic<br />

reasoning (indicating whether each of several statements is true, for example, “A is<br />

in front of B--BA”). Several trials of the hand-tapping test data were not recorded<br />

properly and this variable could not be analyzed. See Table 2.<br />

.<br />

Table 2<br />

Correlations Between Percent of Test Food Consumed<br />

and Automated Performance Test System (APTS) scores<br />

Test: Time.<br />

A<br />

Protein Carbohydrate<br />

/N=l6) /N=l8\<br />

Pattern 1000 hrs -.05 .-.40<br />

Recognition 1200 hrs .41 -.33<br />

1430 hrs .14 -.40<br />

Reaction 1000 hrs -.22 -.27<br />

Time 1200 hrs -.05 -.23<br />

1430 hrs -.03 -.37<br />

Symbolic 1000 hrs -.25 -.25<br />

Reasoning 1200 hrs -.07 -.44<br />

1430 hrs .21 -.27<br />

Correlations between percent consumption and various moods, however,<br />

were more often significant. Table 3 has the results, including just the moods with<br />

significant effects. Moods like dejection, fatigue, and vigor are not included in this<br />

table because none of the correlations were significant. Protein consumption was<br />

positively related to tension and anger, a finding confirming earlier reports<br />

(Banderet ,et al, 1986). However, the association held also in the morning before<br />

the protein meal, again damaging the case for causation. In addition to the<br />

283<br />

.


Mood:<br />

Tense<br />

I--- ----<br />

Table 3<br />

Correlations Between Percent of Test Food Consumed<br />

and Visual-Analogue Mood scores<br />

Time.<br />

-<br />

1000 hrs<br />

1200 hrs<br />

1430 hrs<br />

Hungry 1000 hrs<br />

1200 hrs<br />

1430 hrs<br />

AwrY 1000 hrs<br />

1200 hrs<br />

1430 hrs<br />

Confused 1000 hrs<br />

1200 hrs<br />

1430 hrs<br />

* p


REFERENCES<br />

Banderet, L. E., Lieberman, l-l. R., Francesconi, R. P., Shukitt, 6. L., Goldman, R. F.,<br />

Schnakenberg, D. D., Rauch, T. M., Rock, P. B., and Meadors, G. F. (1986).<br />

. Development of a paradigm to assess nutritive and biochemical substances<br />

in humans: A preliminary report on the effects of tyrosine upon altitude- and<br />

cold-induced stress responses, Presented at and published as Proceedings<br />

of the AGARD Aerospace Medical Panels Symposium, Biochemical<br />

Enhancement of Performance, Lisbon, Portugal, 30 Sep-2 Ott, 1986.<br />

Bittner, A. C., Smith, M. G., Kennedy, R. S., Staley, C. F., and Harbeson, M. M.<br />

(1985). Automated Portable Test (APT) System: Overview and prospects.<br />

Behavior Research Methods, Instruments. & Computers, 17, 217-221.<br />

Lieberman, H. R., Corkin, S., Spring, 8. J., Garfield, G. S., Growdon, J. H., and<br />

Wurtman, R. J. (1984). The effects of tryptophan and tyrosine on human<br />

mood and performance. Psvchopharmacoloav Bulletin, 20, 595598.<br />

Lieberman, H. R., Corkin, S., Spring, B. J., Growdon, J. H., and Wurtman, R. J.<br />

(1982I83). Mood, performance, and pain sensitivity: Changes induced by<br />

food constituents. Journal of Psvchiatric Research, 17, 135-l 45. ,<br />

Morse, D. R., Schacterle, G. R., Furst, L, Zaydenberg, M., and Pollack, R. L. (1989).<br />

Oral digestion of a complex- carbohydrate cereal: effects of stress and<br />

relaxation on physiological and salivary measures. American Journal of<br />

Clinical Nutrition, 49, 97-l 05.<br />

Salter, C. A., Lester, L. S., Dragsbaek, H., Popper, R. D., and Hirsch, E. (1988). A<br />

fully automated memory and search task. In A. C. F. Gilbert (Ed.),<br />

Proceedinas of the 30th Annual Conference of the Militarv <strong>Testing</strong><br />

<strong>Association</strong>. Arlington, Virginia: <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong>. Pp. 515<br />

520.<br />

Salter, C. A., and Rock, K. L. (1989). Using the memory and search task to assess<br />

dietary effects. Proceedinas of the 31 st Annual Conference of the Militarv<br />

Testina <strong>Association</strong>. San Antonio, Texas: <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong>. Pp.<br />

701-706.<br />

Sawyer, D. A., Julia, H. L., and Turin, A. C. (1982). Caffeine and human behavior:<br />

Arousal, anxiety, and performance effects. Journal of Behavioral Medicine,<br />

5, 4 15-439.<br />

Spring, B. J., Lieberman, H. R., Swope, G., and Garfield, G. S. (1986). Effects of<br />

carbohydrates on mood and behavior. Nutrition Reviews/Supolement, 44,<br />

51-60.<br />

285<br />

. .


WHAT MAKES BIODATA BIODATA ?<br />

Fred A. Mae1<br />

US Army Research Institute<br />

Interest in the use of biodata in personnel selection<br />

continues to grow in all branches of the armed services. Various<br />

researchers have advanced legal, moral, and conceptual criteria<br />

that define biodata items and differentiate them from those that<br />

appear in temperament, attitude, or interest measures. These<br />

stated criteria are often disputed by other researchers or<br />

ignored in practice even by themselves. Moreover, in practice,<br />

many items termed tlbiodatal@ are indistinguishable from other<br />

self-report items. The result has been a continued blurring of<br />

what constitutes biodata.<br />

The confusion is especially problematic in light of the<br />

claimed advantages of biodata. For example, biodata scales have<br />

been shown to be more resistant to social desirability faking<br />

than temperament scales (Telenson et al., 1983). However, this<br />

may be true only of certain types of biodata, such as verifiable<br />

items. Similarly, reviews of selection measures (Reilly & Chao,<br />

1982) stating that biodata generally achieve higher validities<br />

than temperament measures are uninterpretable without knowing<br />

what, other than empirical keying, differentiates biodata from<br />

other measures.<br />

The purpose of this paper is to review criteria that have<br />

been used to define biodata and differentiate it from other selfreport<br />

measures. Drawing upon the work of previous researchers,<br />

the qualities that may uniquely define biodata across all<br />

applications are enumerated. Then, additional characteristics<br />

which may be desirable or legally required under certain<br />

circumstances are discussed. In the course of the discussion,<br />

differences between biodata and temperament scales are clarified,<br />

with the two viewed as potentially complementary, though not<br />

mutually exclusive, domains.<br />

The essence of biodata<br />

Biodata items attempt to measure previous and current life<br />

events which have shaped the behavioral patterns, dispositions,<br />

and values of the person. It is presumed that a person's outlook<br />

is affected by life experiences and that each experience has the<br />

potential to make subsequent life choices more or less desirable,<br />

palatable, or feasible. One possible reason is that the focal<br />

experience reinforces a pattern of behavior. Alternatively, the<br />

focal experience may be partly or wholely determined by earlier<br />

causal determinants- genetic, dispositional, or learned- which<br />

account for variations in both earlier and current behavior. A<br />

complete biodata measure should provide Ira reasonably<br />

comprehensive description of the relevant behavioral and<br />

experiential antecedents" (Mumford & Owens, 1987, p. 3).<br />

286<br />

.


Virtually all life experiences are potentially "job relevant",<br />

provided that they empirically differentiate better and poorer<br />

performers on a consistent basis.<br />

Biodata Item Attributes<br />

Historical versus Hvnothetical. Conceptually, biodata<br />

should pertain solely to historical events, activities which have<br />

taken place, or continue to take place. This attribute would<br />

exclude behavioral intentions or expected behavior in a<br />

hypothetical situation.<br />

External versus Internal. Some have argued that biodata<br />

items should deal with external, though not necessarily publicly<br />

seen, actions. These criteria would exclude items about<br />

thoughts, attitudes, opinions, and unexpressed reactions to<br />

events. An item about what one tvnicallv does in situations<br />

could satisfy the historical/external criterion.<br />

*Numerous biodata researchers have utilized non-external<br />

events in their biodata measures, and conceptually, non-external<br />

aspects of events are also capable of having significant impact<br />

on subsequent behavior. Nevertheless, the external event<br />

criterion may be crucial if claiming greater validity for biodata<br />

compared to temperament scales. Temperament scales require<br />

assessments of personal tendencies, often in areas in which<br />

people not only portray themselves favorably (t'impression<br />

management"), but actually see themselves in an unrealistically<br />

favorable light ("self-deception") (Paulhus, 1984). For example,<br />

most employees overrate their work performance compared to that<br />

of peers. Nondepressed persons consistently overrate their<br />

performance, so much that realistic self-evaluation may be<br />

indicative of depression (Mischel, 1979). Similarly, negative<br />

and positive affect orientations have been shown to be correlated<br />

with response patterns on temperament and related scales. Thus,<br />

the *'normallt tendency to overrate successes and underestimate<br />

failings can lead to self-deception and could possibly inflate<br />

responses to some temperament scales. By contrast, biodata<br />

scales dealing with external events purport to force the<br />

respondent to either answer honestly or consciously distort<br />

answers, with the assumption that fewer people will choose the<br />

latter.<br />

Obiective and First-hand versus Subiective. Some who<br />

prefer that biodata be descriptions of external events also feel<br />

that biodata should be obiective recollections, requiring only<br />

the faculty of recall. Subjective interpretation of events, such<br />

as ,assessing if one was 'Idisappointed"," angry", or "depressedl'<br />

in a given situation, would not fit this criterion. Evaluation<br />

of one's qualities or performance relative to that of others<br />

would also be considered subjective. A corollary would be that<br />

biodata items ask only for the first-hand knowledge of the<br />

respondent. Estimation of how others (peers, parents, teachers)<br />

would evaluate one's performance or temperament involves an<br />

287


-..<br />

additional level of speculative subjectivity. Subjective items<br />

would appear to increase the chance of self-deception. Although<br />

subjective corroboration from others is feasible, subjective<br />

items are never objectively verifiable, and hence the chance for<br />

social desirability faking is increased.<br />

Conversely, a number of biodata researchers have made<br />

frequent use of interpretive items. In some studies, subjective<br />

items have actually been shown to have higher predictive<br />

validities than objective ones. An advantage to subjective items<br />

that address self-perceptions is that they can better focus on<br />

unitary theoretical constructs. By contrast, performance of<br />

objective behaviors is often determined by multiple causes and<br />

dispositions, making it difficult to isolate the role of any one.<br />

Barge (1987) has provided evidence that homogeneous items,<br />

tapping a single disposition or tendency, are more predictive<br />

than heterogeneous items such as school or work performance.<br />

Construct-based items are also easier to use to develop<br />

rationally-based biodata scales. It would thus appear that the<br />

use of some subjective items may provide some countervailing<br />

advantages as well.<br />

Discrete versus Summarv Actions. Methodologically, it may<br />

be preferable to focus on discrete actions, dealing with a<br />

single, unique behavior (e.g. age when received driver's<br />

license), as opposed to summary responses (e.g. average time<br />

spent studying). Responses to discrete items only require memory<br />

retrieval, while summary items also require computation or<br />

estimation, thus increasing the chance of inaccuracy. However,<br />

the above preference for discrete actions would obtain only when<br />

the event is unique or singularly memorable. With a regularly<br />

performed behavior, summary recall could be more realistic and<br />

accurate than recall of a single, arbitrarily chosen instance.<br />

Verifiable. A verifiable item is an item that can be<br />

corroborated from an independent source. Item verifiability thus<br />

goes beyond both the external event and objective criteria. The<br />

optimal source of verification is archival data, such as school<br />

transcripts or,work records. Alternatively, the testimony of<br />

knowledgeable persons, such as a teacher, employer, or coach, is<br />

also considered verification by most researchers. Asher (1972)<br />

and Stricker (1987) have advocated exclusive use of verifiable<br />

items, though others utilize non-verifiable items, and some<br />

advocate interleaving verifiable and non-verifiable items<br />

(Mumford et al., 1990).<br />

One reason to use verifiable items is to reduce social<br />

desirability faking and outright falsification. However,<br />

Shaffer, Saunders, and Owens (1986) have shown that social<br />

desirability distortion is not a serious concern with biodata.<br />

Previous research on false or inaccurate responding to verifiable<br />

biodata items has shown mixed results (Cascio, 1975; Goldstein,<br />

1971) which may be due partly to methodological factors (Mumford<br />

& Owens, 1987). Merely warning respondents that answers will be<br />

288<br />


verified can reduce faking (Schrader & Osburn, 1977).<br />

Verifiability should be less necessary with discrete and publicly<br />

witnessed items for which "faking good" would require conscious<br />

lying. When developing biodata, obscuring the l'rightll answers<br />

and deleting transparent items should also discourage socially<br />

desirable responses, even without the threat of verification.<br />

Paradoxically, items which fit the narrowest definitions of "job<br />

relevant" and show the greatest point-to-point correspondence<br />

with future job performance would be most transparent and elicit<br />

the greatest need for verification.<br />

The issue of control. From the aforementioned perspective,<br />

that all life events have the potential to shape and affect later<br />

behavior, there is no reason to differentiate between experiences .<br />

that a person has consciously chosen to undertake and those that<br />

were components of the person's environment. In the same way<br />

that a decision to join.ROTC or study chemistry may lead a person<br />

in a behavioral direction, personal characteristics or the<br />

climate in a person's home and community could also affect<br />

subsequent behavior. Moreover, even optional decisions and<br />

behaviors, such as smoking or amount of time spent studying, are<br />

partially shaped by noncontrollable influences. This view is<br />

reflected in the instruments of biodata researchers who freely<br />

utilize both llcontrollable't and Inoncontrollablett biodata items<br />

(Glennon, Albright, &I Owens 1966). Stricker (1987), on the other<br />

hand, argues that it is unethical to evaluate people based on<br />

noncontrollable items pertaining to parental behavior, geographic<br />

background, or socioeconomic status. He also considers items<br />

dealing with skills and experiences not equally accessible to all<br />

applicants, such as tractor-driving ability or playing varsity<br />

football, to be unfair. Similarly, the developers of the Armed<br />

Services Applicant Profile (ASAP), a biodata measure of<br />

adaptability to the military, also attempted to delete all noncontrollable<br />

items from their instrument (Trent, Quenette, &<br />

Pass, 1989).<br />

In practice, however, consistent adherence to the control<br />

criterion would exclude all items pertaining to physical<br />

characteristics and educational level; behaviors, values, or<br />

interpersonal styles influenced by parental genetics or<br />

nurturing; and vocational interests and behavioral preferences<br />

partially shaped by one's environment. Strict adherence would<br />

thus lead to exclusion of most life experiences likely related to<br />

later behavior. It would also exclude many items typically found<br />

on school and job application blanks. This would present a<br />

severe constraint when sampling applicant pools without extended<br />

job histories, such as military applicants. It is not surprising<br />

that even some advocates of this criterion have been forced to<br />

violate it in their scales.<br />

Invasion of privacv<br />

A final concern involves invasion of privacv. Intrusive<br />

questions are mainly problematic with background checks that<br />

289


focus on previous criminal and aberrant behavior. In contrast,<br />

most biodata deal with behaviors whose revelation would not harm<br />

respondents. Some questions, such as those pertaining to marital<br />

status, age, and physical handicaps, may be invasive if the<br />

responses were to be placed in the employee's personnel folder,<br />

but not if the responses were used only by researchers to<br />

generate applicant scores. An additional reason not to reveal<br />

individual responses and their implications to decision-makers is<br />

in order to maintain biodata key confidentiality.<br />

Summary<br />

This paper proposes that the core attribute of a biodata<br />

item is that it addresses an historical event or experience. ,The<br />

rationale is that previous events shape the behavioral patterns,<br />

attitudes, and values of the person, and combine with individual<br />

temperaments to define the person's identity. Other attributes,<br />

though not defining biodata, may have methodological advantages.<br />

These include limiting items to those regarding external events,<br />

those that only require objective recollection of events, and<br />

those asking only for first-person recollections. Items<br />

involving discrete, unique events, and events that are verifiable<br />

are also favored by some for these reasons. However, these<br />

latter attributes may have their own limitations. Limiting<br />

biodata to controllable life events is seen as overly<br />

restrictive. Exclusive use of verifiable and especially<br />

controllable items may.hamper efforts to cover the domain of<br />

relevant life events, as well as reduce validity. While clearly<br />

intrusive items are offensive and hence undesirable, definitions<br />

of and concerns about invasion of privacy will vary, depending on<br />

the situation.<br />

By attempting to measure historical events and experiences<br />

that may have impacted on behavioral tendencies, it should be<br />

possible to focus on a unique realm of individual differences not<br />

exhausted by temperament and other self-report measures. Perhaps<br />

biodata measures, as presently defined, could be used in tandem<br />

with temperament measures for optimal results. However,<br />

researchers should be exceedingly careful about making claims<br />

extolling biodata's virtues over other self-report measures.<br />

REFERENCES<br />

Asher, J. J. (1972). The biographical item: Can it be improved?<br />

Personnel Psvcholoqy, 25, 251-269.<br />

Barge, B. N. (1987). Characteristics of biodata items and their<br />

relationship to validity. Paper presented at the 95th annual<br />

meeting of the American Psychological <strong>Association</strong>, NY, NY.<br />

Cascio, W. F. (1975). Accuracy of verifiable biographical<br />

information blank responses. Journal of Anplied Psvcholoqy,<br />

60, 767-769.<br />

290<br />

_


Glennon, J. R., Albright, L. E., & Owens, W. A. (1966). A cataloq<br />

of life history items. Greensboro, NC: Creativity Research<br />

Institute of the Richardson Foundation.<br />

Goldstein, I. L. (1971). The application blank: How honest are<br />

the responses? Journal of Applied Psvcholoav, 55, 491-492.<br />

Mischel, W. (1979). On the interface o.f cognition and<br />

personality: Beyond the person-situation debate. American<br />

Psvcholoqist. 34, 740-754.<br />

Mumford, M. D., & Owens, W. A. (1987). Methodology review:<br />

Principles, procedures, and findings in the application of<br />

background data measures. Anplied Psvcholoaical Measurement,<br />

11, l-31.<br />

Mumford, M. D., Owens, W. A., Stokes, G. S., Sparks, C. P., and<br />

Hough, L. (1990). Developmental determinants of individual<br />

action: Theory and practice in the application of background<br />

data measures. Unpublished manuscript.<br />

Paulhus, D. L. (1984). Two-component models of socially desirable<br />

responding. Journal of Personality and Social Psvcholoqv,<br />

46, 598-609.<br />

Reilly, R. R., & Chao, G. T. (1982). Validity and fairness of<br />

EOCe alternative employee selection procedures. Personnel<br />

Psvcholoav, 35, l-62.<br />

Schrader, A., & Osburn, H. G. (1977). Biodata faking: Effects of<br />

induced subtlety and position specificity. Personnel<br />

Psvcholoqv, 30, 395-405.<br />

Shaffer, G. S., Saunders, V., & Owens, W. A. (1986). Additional<br />

evidence for the accuracy of biodata: Long-term retest and<br />

observer ratings. Personnel Psvcholosv, 2, 791-809.<br />

Stricker, L. J. (1987). Developing a biographical measure to<br />

assess leadership potential. Presented at the Annual Meeting<br />

of the <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong>, Ottawa, Ontario.<br />

Telenson, P. A., Alexander, R. A., & Barrett, G. V. (1983).<br />

Scoring the biographical information blank: A comparison of<br />

three weighting techniques. Applied Psvcholoaical<br />

Measurement, 2, 73-80.<br />

Trent, T., Quenette, M. A., & Pass, J. J. (1989). An oldfashioned<br />

biographical inventory. Paper presented at the<br />

97th Annual Convention of the American Psychological<br />

<strong>Association</strong>, New Orleans, LA.<br />

291


JOB SAMPLE TEST FOR WY FIRE CONTROLMAN<br />

Susan Van Heme 1 , PhD<br />

Frank Al 1 ey, Ph D<br />

Syllogistics, Inc.<br />

Spr i r,gf i e 1 d, l44 22151<br />

Herbert George Ba.ker, PhD<br />

Laura E. Swirsk i<br />

Na.vr Personne I Research and Deve 1 opmen t Center<br />

San Diego, c.A Y2152-68Clcl<br />

ABSTRACT<br />

The Navy has developed job sample tests Sor a number of i ts<br />

en1 isted occupations !or ratings> as part O f t h e<br />

Joi n t-Serv i ce Job Performance Measurement Program. One of<br />

those rat i rigs is Fire Controiman (FCI. This paper details<br />

the deve 1 opmen t of hand,=-on tests for first-term FC: data and<br />

radar personnel r and their administration to a sample of FCs<br />

(N=103> . The resul ts of test i ng are di scuss.ed, showing the<br />

relationship of test scores. to several criteria.<br />

INTRODUCT I DN<br />

Several reports detai 1 the research stra.tegy .a n d<br />

purposes. of the Joint-Service Job Performance Ueasuremen t<br />

(JPMI/Enl istment Standards Project COff ice of the Assista.nt<br />

Secretary of Defense, 19821, and the origin and scope of the<br />

Navy JPF1 Program (Laabs & berry, 1987). I n the Nav 7 ’ s<br />

effort, performance meas.ures are be i nq deve 1 aped for sever.3 1<br />

ratings, one of which is tha.t of fire controlman C FC :) .<br />

The MIX 84 Gun F i re Control System C c;FCS:! i s u c.e d t 0<br />

control var i 01~s surf ace sh i p-moun ted guns, 1.4 h i c h are used<br />

against both surface and airborne targets. F i rs.t-term Ml< 86<br />

GFC!s Frl:.g. are current 1 y t r a i n e d a n d d e p 1 II ;v. e cl i n t IA o d i f f e r e n t<br />

c,pec i .sl t i es.. Bc,th r,.JEcs o>per.&te the PIP< 516 GFC:S? blJt NEC:<br />

1 1 2iT;a. Sp e i: i .+. 1 j 2 * '3 i n m .a i n t e n 3. n c e O f t tl e r .s. a j .3. r c. IJ b s. ::i’ E. t e m<br />

I:.J h i 1 @ /..JES: 1129 5. p e 11 i .a 1 i z e 5. i n ~~1.3, I n t + n EC n c e 0 f t h c lj.2. t .3.<br />

processing subsystem. Both t:,*pez. ,:a+ PlcC: 3,s Ftzs go tt-lrout~h a<br />

i t r a i n i n g pipe1 ine w h i c h i nc 1 udes E: 3.5. I: El ec tr i c i ty a n d<br />

El ec tron i csI a Fire Control cr;-school F<br />

for e i ther data or r ada,r . fvlh’ $4,<br />

pm.“1 ;,‘1”<br />

, .- c, =.<br />

:3 6<br />

are<br />

c .- s,: h 0 114 1<br />

u s u a 1 1 )’<br />

cross-tra.ined on the s.econd subsystem at the end of their<br />

f irs.t or tfeginn ing of their second tour of duty,<br />

APPROACH<br />

292


to develop materials to the best level of deta.il pos.sible,<br />

then returned to another SP~E pa.nel ior cr i t i 17~~. Use Elf SMEs<br />

from al 1 three PlK 86 FC C-schools ensured that al 1 of the<br />

5 i t e s wou 1 d have i n p u t i n t o the test development process.<br />

. Tryout was conducted on actual equipment to be used i n<br />

testing. Final SkjE review arId i tern ret’ i nement fol 1 C@Jed the<br />

tryout , and preceded the field test.<br />

Verification Of Tasks Selected For Testinq<br />

The first-term MK St; GFCS FC is ful 1:~ trained WI onl><br />

one of the subsystems. , and al though he I: the rating is closed<br />

to women> may work on the other subsystem, he is I7 0 t<br />

qua1 i f i ed on i t through training or experience. Because of<br />

th i s.$ a single set of tasks could not be used; separate test<br />

i terns had to be deuel aped for each subspec i al tr.<br />

The first step in the development process was to verify<br />

the task 1 i st, Panels of PlK 86 GFCS SPlEs convened at the<br />

three MK 85 GFCS C-School reviewed the 1 ist, and suggested<br />

subst i tutes for tas.ks found unsu i tab1 e. Each task w a 5<br />

evaluated as to i ts appropriateness for hands-on tes.t i ng,<br />

according t 0 the f 01 1 CIW i ng cr i t er i a : (1 > represen tat i vr’ness<br />

of t h e f irst-termer’s j ob (2) mission cr i t i cal i tr; 1; .‘3 ><br />

frequence o f performance C4:, suff i c i en t v a r i a b i 1 i t y i n<br />

performa.nce) and, (51 practical i t:2’ for t e s t i n g a t t t-l e<br />

C-school si tes. In addi t ion, the SklEs. were asked to consider<br />

the need for comprehens i ue task coverage, and equ i val en t<br />

test diff icul ties for the two NECs.<br />

The SPl&=G provided detai led informat ion on the 7 test<br />

i terns far eaih test. Specific .:.utltas.k:. were conf i rmed for<br />

each major task, al ong I.‘.J i t h specific f 5. u 1 t s for t l-l e<br />

di agnost i c and trout,1 eshoot i rig i terns., Addi t i onal t 9 c h rl i c 3. 1<br />

documen tat i on was prov i dqd for use i I-I deve 1 op i ng scar i ng<br />

shee t s for procedural i zed t a s k c. * The test tasks L’.J e r 8<br />

sequenced to provide the smoothest and quickest pos.sible<br />

progressi art through the tes.t , and equ i pmen t requ i remen ts<br />

were verified and refined. The i nformat i on gathered enabl ed<br />

the test development team to begin writing draft test items,<br />

and to prepare for the first SHE panel.<br />

i kf ter t h e f i n a 1 SI-1 E r e v i e w t h e nf orma t i on a n d<br />

revisions were incorporated into dra>t test i terns. P 1 a 1-1 5.<br />

were made for trying out the test i tems on a smal 1 s.ampl * caf<br />

first-term PlK S6 GFCS FCs.<br />

Test Item Tryout<br />

Four f i rE.t-trrm HK ;3& C;FC:S FC:S, c.tat ioned o n sh i DC. a t<br />

Norf 01 k, VA, served as tes.t subjects. TI~JO were da. t a FCs I:PAEC<br />

1125’) and two radar I/..JEC 1 12sj . The equ i pment used for the<br />

tryout was the Darn N e c k I-1K SA GFC!S trai n i ng s:;~*stem, IaJh i ch<br />

includes a full se t 0 f ac tu.31 equ i pmen t e q u i v 3.1 e n t t 0 t h e<br />

293<br />

.


i<br />

MOD 10 Capabi 1 i ty Expanded shipboard system. Except for an<br />

added simulation capabi 1 i tr for t h e interface t 0 t h e<br />

equipment control led br the system, and a “fan-out” version<br />

of the WE-7(V) computer (an extra UYK-7 wi th the c ircui t<br />

card pl a.nes exposed to permit easy access), t h e 5)‘s tern<br />

rep1 i cates a sh i p-mounted system? and i s compoc.ed e n t i r e I :k’<br />

of actual equipment, housed in two connect i n3 rooms in t t-l e<br />

SC l-l QO 1 building, with a full set Of s.ys, t err1 t e c h n c a 1<br />

documentation avai iable in the training laboratory, along<br />

wi th al 1 required tools and test equipment.<br />

The purposes of the tryout were to: cl> verify tha.t the.<br />

test items would perform properly with the equipment; ?12:)<br />

ensure that instruct i ons were clear and accurate ; ( 3 :I<br />

determine whether t h e suggested i tem t irne 1 i m i t 5 w e r e<br />

real i st i c ; (41 uerify that there would be some variability<br />

among subjects i n performance on the i terns; a n d ( .f ) r 4? 4.) e a 1<br />

unanticipated problems of any sort, Because of t l-l e srna 1 1<br />

samp 1 e , there WitS rl ct attempt to 3a ther 5 t a t i 5 t i c a 1<br />

informat ion.<br />

Because one of the purposes of thE. tr;*‘CifJt bJ.3S to<br />

determine whether the time limits were reasonable, a subject<br />

Was al 1 owed to cant i nue working on a task un t i 1 i t w a c.<br />

completed if he was making progress on the task, and the<br />

compl et ion t ime was recorded. Al 1 subjects were able to<br />

complete the test within four hours or less, but there wan<br />

considerable vari abi 1 i ty in the ccmplet i on time on most of<br />

the i terns. Th i s small sample did not permi t conf i dent<br />

predi c t ion of the best time 1 imi ts for al 1 i t ems 9 but did<br />

suggest some changes to suggested time 1 imi ts.<br />

The PeSlJl ts of the tryout were posi t i ve. No major<br />

problems arose during the tryout. The test i terns performed<br />

we1 1 0 n t f-1 e equ i pmen t , w i t h on 1 > m i nor adjustments i n<br />

procedure required (some improvements in the techniques of<br />

fault i riser t i on, to ensure that prefaul ted modules and<br />

groundi n3 straps were not visible to exami nees) . The<br />

instruct i ons were understandable, with a few areas to be<br />

clarified. The final i terns included these changes.<br />

Final SME Review<br />

The revised test i terns were reviewed tl:v SklEs. i two data<br />

instructors and two radar instructors) at Great Lakes. The<br />

St”!Es c 1 ar i f i ed scme technical issues, (e.g., documentation<br />

nomencl a.ture and faul t i riser t i on techn i ques> .<br />

Field Testinq<br />

Site PreDaration<br />

294


considerable time preparing for the tryout I rehearsing each<br />

i tern, verifying that the faul ts to be inserted would produce<br />

the des. i red i ndi cat i enc., and e n 5 u r i n g that ttre training<br />

equ i pmen t w a 5 in good condi t ion + c, ,’ t e 5 t i n g . Test<br />

admi n i strattor train i ng consi sted of rev i euJ and prac t i ce of<br />

the test items and procedures.<br />

Test i no Ders.onne 1<br />

The test admirl i E.tratctr ~a=. a retired E-S’ MK 36 GFC:S<br />

techn i c i an, who had served as a t”lt< 35 Course Di ret tar for<br />

the three years preceding his retirement. One of the school<br />

sen i or staff C E-7/3) W d S available throughout . He<br />

part i c i pated i n the preparation for testing, helped with<br />

equ i pmen t setup and was a.blc t o solve t h e f e w equ i FaKlerlt<br />

probl ems wh i ch occurred. Two observers were on si te, with<br />

one present in the testing area during all test periods.<br />

Test Subjects<br />

The samp 1 e consi sted of 103 individuals engaged i n<br />

the i r f i r s t term of mil i tarr serv i ce . There I~J e r e 45<br />

. .<br />

indlvlduals tested in Dam Neck and 53 individuals tested in<br />

San Diego. Al 1 individual s in this sample were ma1 e. The<br />

major i tr of the FC'S i n t h i s 5.3ff1p 1 Q were i n t tre third,<br />

fourth, and fifth years o f the i r mi 1 i tary serv i ce<br />

obl i gat i ens. S i x t y-one individual s were c 1 ass i f i ed i n t h e<br />

radar subrating and 42 individuals were classif ied in the<br />

data subrat i ng. Al 1 ~lCiC~%~ of the FC’s were high E.chool<br />

graduates who either earned diplomas or GED equivalents..<br />

EQuiDment<br />

The equipment used for the f i el d test was the same as<br />

that used for the tryout, plus equivalent equipment a.t the<br />

San Diego C-s.chool .<br />

Procedure<br />

When the subjects arrived at the testing si te, they<br />

were given a brief introduction to the project, with an<br />

explanation that their performance would in no way affect<br />

their service records, and would not be reported to anyone<br />

but project staff.<br />

At the beginning of each testing session, the s.ub.iect<br />

was given oral and wr i tten instruct ions on the test i ng<br />

procedure and the ground rules for t l-l e test i ng. Some<br />

biographical data were collected, and then the firs% item<br />

was admi n i stered. For e a c h i tern, the t e s t administrator<br />

gave oral and wr i t ten instruct ions on the task requ i remen ts<br />

and the time al 1 owed for camp 1 e t i on . The su bj ec t w a s<br />

encouraged to ask quest ions before beginning the task. W h e n<br />

295


the subj eC t indicated that he was ready to begin, the tes.t<br />

administrator instructed him to start and began timing.<br />

At s.evera 1 p a i n t s in the testing sequence i t was<br />

necessary for the test administrator to insert or remove<br />

faul t condi t i ons or otherwi se prepare the equipment for the<br />

next i tern. kt these times (3 or 4 per test> the subject was<br />

excus.ed and given a break of approximately fiue minutes..<br />

Throughout testing, the test adminstrator observed the<br />

subject’s actions, checking off 5 t e p s performed in<br />

procedures on the scoring sheets provided, and recording and.<br />

eval uat i ng troubl eshoot i ng (non-procedural 1 act i ens on other<br />

forms. When necessary, the test administrator queried the<br />

subject to determine what he was doing or attempting, Time<br />

to complete each task was al so recorded. Upon completion of<br />

testing, each s.ubject was. asked how frequent1 y he performed<br />

each of the tested tasks on the job, and when he had most<br />

recent 1 Y performed each tas.k.<br />

RESULTS<br />

Each of the FC hands-on performance tests consisted of<br />

7 tasks, each of which yielded a single, overal 1 score<br />

ranging from 0 to 100. Carrel at ions were computed for the<br />

radar and data subrat ings, combined and separatel:x*, and are<br />

shown i n Tab1 e 1. The carrel at i on between overal 1<br />

performance on the hand,=-on test with kFQT for the radar and<br />

data subrat i rigs comb i ned was -.03. Correct i ng for<br />

restriction in range resulted in a correlation of .12. The<br />

corre 1 at i on between overal 1 $erf ormance on the hand,s-on test<br />

wi th AFGT for the data subrat ing was .30. T h e carrel at i on<br />

between overal 1 performance on the hands-on test with AFQT<br />

for the radar subrat i ng was -.lO. Correcting for<br />

restriction in ranoe res.ul ted in a correlation of .17 for<br />

the data subrating- and .14 for the radar subrat ing.<br />

correlations were not signif icant,<br />

All<br />

CoPbInd (Data and Radar)<br />

Uands-On and AFOT -.03<br />

�������� � d AFOT<br />

corrected for restrIction 8n range .i2<br />

Data<br />

Hands-On and A.=01 *JO<br />

“m&-On and AFOT<br />

cmrr*cted for restriction 1.9 ran** .I7<br />

Radar<br />

Hands-On md AFOT -.,o<br />

Hands-On and AFOT<br />

corr,ct.li f0P rrstrlctlcn In r*ngr .I4<br />

Tablo 1. Correlations Rmtwem Hands-On<br />

Porformmce w, th AFDT


REFERENCES<br />

Laabz., G. ~1.) &‘< Berry, V. 14. ( 1 P87, August). The Navr<br />

job performance measurement uroaram: Backuround,<br />

i nceP t i on, and current status (NF’RDC TR S7-34) .<br />

San Diego: Navy Personnel Res.earch and Development<br />

Cep ter .<br />

.<br />

Off ice of the Assistant Secretary of Defense (PlRA&L). 1:1982,<br />

December). ‘Joint -service efforts to 1 ink en1 istment .<br />

standards and job performance: First annual report<br />

to the House Commi ttee, on Appropriation. Washington,<br />

DC: Author . ‘<br />

297


ASVIP: AN INTEREST INVENTORY USING<br />

COMBINED ARMED SEWICES JOBS<br />

Herbert George Baker, PhD<br />

Mar j or i e M . Sands.<br />

Navy Personne 1 Research and Deve 1 opmen t Cen ter<br />

San [> i ego, CA y2152-68CllJ<br />

Arnold R. Spokane, PhD<br />

Spokane Career Assoc i ates<br />

Al lentown, PA 18104<br />

ACISTRACT<br />

A number of 7vocational i n terest i nven tor i es have<br />

been devel aped by the A r m e d Serv i ces for use in<br />

guiding en1 i sted p b? r 5 I:I t-1 r-1 e 1 into VI i 1 i t .3. r Ir;<br />

0 c c u p a t i 0 n 5 . T I-I e 4. tl’ i ri 5 t. r u rrl e n t t h 3 ‘2 e I, ‘“. * d<br />

occupa t i onal activities, jub t i ties, retreat i orial<br />

activities, and so ftr th -- or , a combi nat i cln cef<br />

such elements. Test subjects then i n d i c .a t e the i r<br />

interests or preferences for each i tern. Al though<br />

great efforts have been made to cross-code<br />

military .j 0 b s between t h e Serv i ces a i-1 d w i t h<br />

civi 1 ian occupations, unti 1 I-I 3w n 0 interest<br />

measure has used the combined Armed Serv ices jabs.<br />

Th i s paper describes t h e development a n il<br />

admi n i strat ion of the Armed Serv i ces Vocat i onal<br />

Interest Prof i le (ASVIP) . The instrument uses the<br />

job titles (officer and en1 isted) found in the<br />

Mi 1 i tary Career Guide, the jobs al so having been<br />

assigned three-letter Ho1 1 and Codes. In scar i ng,<br />

resul ts indicate the most preferred Ho1 1 and Cude)<br />

plus a n i ndi cat i on of h i gh or 1 CUJ preferred<br />

occupational level . This paper reports on a studr<br />

t 0 measure t h E, e n d o r s e m e 1’1 t 0 f t h e<br />

combined-services jobs. Suggest i ens 3re made for<br />

use and for further research.<br />

INTRODUCTION<br />

Vocational interests have long been recognized as one of the<br />

many individual character i st i cs that affect occupat i ona.<br />

exploration, j ob acqu i si t i an, WCII-k 5.3 t i 5 f ac t i on c. , a n d .r<br />

perhaps, performance. There are many theories of uocat i 13r1al<br />

i n terests and j ob preferences, and a great number of<br />

i nstrumen ts have been deve 1 oped to identify and 115 e a 5 I.J r e<br />

vocational i n terests. One of t h e ma j or US-55 4 ,I. p t h 8 c. 8<br />

instruments is in guiding young people into the tz’pes of<br />

work for which their interests best suit them.<br />

298<br />

*<br />

.


Similarly, a number of vocational i n terest inventor i es. have<br />

been devel aped by the Armed Serv i ces for use i n gu i di ng<br />

en1 isted personnel into military occupations. Examples<br />

include the Uocat iunal Interest Career Exa.minat ion (Al ley,<br />

1978>, developed by the Air Force, and the Navy, L’ocs.~~,(~;,~~<br />

Interest I riven tory (Abrahams I L a u , & Neumann r -ao.J .<br />

Al though research has shown promise to enhance the se1 ect i on<br />

and classification processes through the i ncorporat ion of a<br />

formal , measured interest component, with the exception of<br />

the Air Force, i riterests have remained an exper imen tal as<br />

opposed to an operational consideration.<br />

The various vocat i onal i nterest instruments devel uprd by the<br />

Armed Services have used occupational activities, j ob<br />

titles, recrea t i onal activities, and 5.0 forth -- or, a<br />

combi nat i on of such elements. Test subjects are asked to<br />

indicate their i n terests or preferences for each i tern.<br />

Scoring systems then report out an interest type, match the<br />

subject wi th a f-1 occupa t i onal area, or i n 5. cm e 0 t h e r wa)<br />

indicate the interests of the individual.<br />

A few years ago, great efforts were made to cross-code<br />

military jobs between the Services and PJ i t h civil ian<br />

occupat i ons, i n a project sponsored by the Office of The<br />

Assistant Secretary of Defense (FM&P> (Dale, Wright, Haven,<br />

Pavlak, & Lancaster, 1989). The resul t was a taxonomy of<br />

what may be called combined-Services jobs -- identical to no<br />

specific job, but i ncorporat i ng occupat i onal i nf orma t i on<br />

from each Service that has a similar job (plus the Coast<br />

Guard). I t should be noted here that 5 am e .j ohs (e.g.?<br />

infantrymen, dentists, etc.) are n o t represented in t h e<br />

occupat i onal structure of al 1 the Services.<br />

The combi ned-Serv i ces job taxonomy offers a nurnber of<br />

research oppor tun i t i es using occupat i onal i nf orma t i on<br />

specific to DOD jobs. Howeuer, to date, no interest mellsure<br />

h a s used the combined Armed Services jobs.<br />

APPROACH<br />

The combi ned-Seru i ces j chs , both en 1 i sted (w1:34 j a n d<br />

officer (N=71> , 1 isted in the Ni 1 i tarr Career Gu i de<br />

(Department of Defense , 1981) were merged and al phabe t i zed<br />

into a numbered 1 ist of 205 i terns. Whi le there has been much<br />

controuersr over the wisdom of using job t i t 1 es in interest<br />

measurement ) substantial research supper ts the i r use. More<br />

recently, Ho1 1 and, Got tf redson I a n d GaKer ( 1 ppTJ > , i n<br />

research wi th Navy recrui ts, found that use of job t i tles<br />

WCtB both feasible and meaningful. Arguably, the Navy Is job<br />

ti ties are the most esoteric and potential 1~ confusing to a<br />

young person , ret there were few, i f any, problems i n the i r<br />

use wi th young ma1 e sailors. Consequentlr, the even more<br />

understandable combi ned-Serv i ccc. job t i t 1 es were cons i dered<br />

fully suitable for use as items on an interest instrument.<br />

299


The 205 job t i tl es. thus became i tems on an i riven tory, the<br />

Armed Serv i ces Uocat i onal Interest Profile (ASVIP), as shown<br />

in .Figure 1. The answer sheet 1 ists, for each item, three<br />

answer options, L, I , and D (for Like, Indifferent, and<br />

Dislike, respectiuely). Typ i cal scar i ng strategies 5) s i n g<br />

these options cal 1 for simply disregarding the I responses,<br />

and suhtrac t i ng the number of Ds from the number of Ls..<br />

44. Construction Equipment Operators<br />

45. Correct i ens Spec i al i sts<br />

46. Court Reporters<br />

47. Data Entry Special i sts<br />

48. Data Processing Equip. Repa.irers<br />

4s. Data Processing Managers<br />

50. Dental Laboratory Techn i c i ans<br />

51 , Dental Spec i al i sts<br />

52. Dentists<br />

53 . Detectives<br />

54. Dietitians<br />

55. Dispatchers<br />

56. Divers<br />

Figure 1 . . Examples of 1 terns<br />

The ASVIP was administered to samp 1 es of ma1 e (N=l 501 and<br />

female (N=150) Navy recruits, at the Great Lakes, I1 1 inoic.<br />

and Or 1 ando, Fl or i da Navy Recru i t Train i ng Commands. T h e r e<br />

WEtS no time limit for the test. Completion times ranged from<br />

18 to 32 minutes, with a mean of 24 for males and 23 far<br />

females.<br />

Th i s pi lot study, i n addition to a s 5 e s s i n g t h e test i ng<br />

logistics for the instrument, was designed to reveal 5 I-1 e<br />

levels of endorsement for the 205 jotIs. Thus, for the effort<br />

repor ted here i n, only the L responses were considered in the<br />

scoring process..<br />

RESULTS<br />

Each of the 205 jobs received SCIITI~ endorsement from this<br />

s a m p l e of Navy recrui ts, even though not all of the jobs c a n<br />

be found in the Navy. The range of endorsement (out of a<br />

poss i tl e 300) is. f ram 20 4 for Photographers to 25 for<br />

Cl 0th i ng a.nd Fabr i c Repni rers. T .3. /J 1 e 1 s t, CII,~.J~ t h c c u mu 1 a t i v e<br />

frequent i es wi th which each of the uff i cer 3. rf fij 2 rl 1 i 5 t e ~j .j 13 tt.5.<br />

\,..I .$ 'S. C ri #j 811 r 5. e rJ , T t-1 a t i z. , t: h e t 3, b 1 e 9. j-, Ol,...J~f. r-, urn tse I- il f t i n-1 e E. t h e<br />

L I KE r e 5.~1 cl n ‘5. e cp t i ,c, n !..,,I 2. E. (1 t, 0 : + r; F I; r I: /-, 3 t i t em .<br />

CONCLUSIONS<br />

The resul ts 0 f t I-I i s pi lot s t u d :>’ c. j-1 Cl(>.J t h 3. t t r-1 e I- e<br />

i -2<br />

endorsement + 0:~ r al 1 cornbi neJ-Serv i t:es. .j 0 t1.s 9 a Ii d t t-l 3. t there<br />

i s a reaSOnatl1 e di spersi on across. al 1 of t h e j 0 tl t i t 1 es.<br />

This sctgqfictc __d _- that the ASVIP might be useful a.5 ,j , 3. 5 C ?J ‘Z. 5. j 0 l-l<br />

-.-_<br />

300


tool wi th which tu begin acquaint i rig young people wi th the<br />

occupational oppor tun i t i es offerred by the mil i ta.ry, that<br />

is, as a gu i de for occupat i anal exp 1 ora t i on i n t o t h e<br />

milli tary working community. Administration to other-Service<br />

and civilian samples is an obvious necessity before any firm<br />

conclusions could be drawn as to the feasibility for use of<br />

t h e AS’S I P i n j ob expl oration, counse 1 i ng, and<br />

classification.<br />

RECOMMENDATIONS FOR FURTHER RESEARCH<br />

A number of research opportun i t i es suggest themse 1 ves at<br />

this point. One would be to compare t t-l e 5 t r e n c~ t h o f<br />

endorsement across the 205 job titles with the endorsement<br />

of similar civilian job titles, the latter being information<br />

al ready avai 1 able i n the research 1 i terature.<br />

.<br />

At the time of data co1 1 ec t i on, gender i nformat ion was al so<br />

collected. This enables a study of differential response<br />

pat terns be twen ma1 es and female Na.vy recrui ts. Al c,o, data<br />

were co1 1 ected wi th an al ternat ive instrument using the sxr~e<br />

items, but 1 isting them within the cateyor ieE. used in the<br />

Mi 1 i tarr Career Gu i de, rather than in an alphabet i cal<br />

1 isting. This makes it possible to study the effects of<br />

presenting items in simple alphabetization versus presenting<br />

them in ways that make possible the influences of categorj<br />

names on response patterns.<br />

Fur thermore, each of the 20sb combi ned-Serv i ces job5 14 a -3<br />

coded using the Ho1 1 and three-letter occupat i onal cod i ng<br />

system (Ho1 1 and, lP85>, Several studi es suggest themselves:<br />

(1) compar i sons between ma1 e and female endorsements across<br />

t l-l e six Ho1 1 and primary codes; (21 assessmen t of i nd i v i dual<br />

response consistency within each Holland primar:v code; and<br />

(3) tracking t h e wsjec t5 and compar i ng perf ormarrce<br />

eval uat i ens in 1 ight of the congruence tte tween i n tere.sts and<br />

actual job .3ssignn1ents i n t h e Na v :Y .<br />

Other possi bi 1 i t i es inc’lude studying t h 8 d i f f e r e n t i a 1<br />

respon5.e pat terns for h i gh and 1 CIW asp i T-EC t i on 1 eve 15 0: i . e , Y<br />

c.f f i cer and en1 i sted jobs), in terms of both ma.1 e-femal c<br />

differences and intra-individual consistency.<br />

Finally, t l-l e much-discussed i mp ac t of forward 3. r e G,<br />

ass i qnmen t on women s job asp i rat i ens can be addr e c-.se d i n k<br />

small VJE%j/ by using the ASVIP. The i rfs trumen t shou 1 d be<br />

administered to another Navy female recruit population a fecz.1<br />

years hence to assess the impact of Operation 0 8 s. e r t !s /-I i e i j<br />

on wome rl ’ 5 job asp i rat i ens.<br />

302<br />

., .


REFERENCES<br />

Alley, W. E. (1973) (jocat i ona. Interest Career Exami fiat i on:<br />

use and Appl i cat ion i n Cc~urlse 1 i nc~ a.nd J Cl t, PI acemen t<br />

(AFHRL-TR-73-62). 3rooKs kir Force Ease, TX. Personne 1<br />

Research Di v i si on.<br />

ALrahams, N. kl., Lau, A. W. g & Neumann, I . (1963) &<br />

Ana. ysis of the Navy Vocat i ma.1 Interest I n v e n t or y as a<br />

Predictor of SC h 00 1 Perf ormirnce and Rat i rlq Ass i qnmen t .<br />

(NPRDC-SRR-69-11) , San D i E c~ 0 : l.dalJ Y Personnel Re se a.r c h<br />

Activity.<br />

Dale, C., W r i g h t , G., Haven, R . , F‘a~laK, PI., & LancaSter, 61.<br />

The DOD Mi 1 i tary/Ci v i 1 i an Master Cr OSSW.3 1 K Project.<br />

Proceedings of the 31st Annual Conference of the Mi 1 i tary<br />

Testina <strong>Association</strong>. San Antonio, TX: Air Farce Hum.~n<br />

Resources L&oratory a n d USAF Occupat i onal l4eacuremen t<br />

Center, pp. 250-255.<br />

Department of Defense (1988) r-1 i 1 i t 3. r s Career GIJ i tje<br />

1933-1939. Washington, DC: Author.<br />

Ho1 1 and, J. L. , Gottfredson, C;. D. , & E:ak:er, H. G. c i 99~~:~<br />

Ual i di ty of Vocat i onal Asp i rat i ons and Interest Inventor i es:<br />

Extended, Rep1 icated, and Re in terprs ted. Journa 1 Clf<br />

Counseling Psychology, 37, 3, pp. 337-342.<br />

Ho1 land, J. L, r: 1935) Manual far t h e t,Jc,cat j orfa.1 Prererence<br />

Inventory, Odessa, FL: Psychological {+~.sessKlerl t Re%olJrces.<br />

303


PREDICTING PERFORMANCE WITR BIODATA<br />

Morris S. Spier, Ph.D.<br />

Somchai Dhammanungune, Ph.D.<br />

U.S. <strong>International</strong> University<br />

Herbert George Baker, Ph.D.<br />

Laura E. Swirski<br />

Navy Personnel Research and Development Center<br />

ABSTRACT<br />

.<br />

A scored biographical questionnaire was developed and<br />

administered to a sample of Navy Fire Controlmen in two subratings:<br />

radar operations and data processing. The subjects<br />

were subsequently administered an extensive, hands-on test<br />

of technical proficiency. A correlational analysis<br />

identified 15 items that may predict proficiency for the<br />

radar subrating, and 20 items which may predict job<br />

performance for the data processing subrating. Crossvalidation<br />

is needed to confirm the findings.<br />

INTRODUCTION<br />

The notion that past behavior is the best predictor of<br />

future behavior both supports and receives support from the<br />

use of scored autobiographical questionnaires. Biodata has<br />

_ demonstrated its usefulness in predicting a range of factors<br />

employment setting including: (1) career<br />

~~ogr~!~ion; (2) turnover/job tenure;<br />

(3) job satisfaction;<br />

and (4) trainability. The convergence of the findings to<br />

date support the notion that biodata approaches tend to be<br />

excellent predictors of a wide range of employment-related<br />

criteria.<br />

The Armed Services, in cooperation with the Department of<br />

Defense (DOD), are currently engaged in a Joint-Services Job<br />

Performance Measurement (JPM) Project of which the present<br />

research is a subtask. The larger project is investigating<br />

the feasibility of measuring on-the-job performance with an<br />

aim toward using the measures to set military enlistment<br />

standards. As a part of its contribution to the Joint-<br />

Services Project, the Navy (Laabs & Berry, 1987) is<br />

developing performance measures for a number of occupational<br />

specialities (ratings), including that of Fire Controlman<br />

���� �<br />

There are, thus, separate proficiency tests for radar and<br />

data processing personnel. Scoring test is done using a<br />

scoring sheet to grade steps in the process as having been<br />

completed either "correctly" or llincorrectly,'t and to grade<br />

any products produced as a part of the process as either<br />

"acceptable" or nunacceptable." The final score is a tally<br />

of the correct and acceptable actions and products.<br />

304<br />

.


METHOD<br />

The purpose of the present research was<br />

autobiographical questionnaire and to<br />

relationship between scores on the biodata<br />

performance on the hands-on tests.<br />

Biodata Questionnaire Development<br />

to develop<br />

determine<br />

instrument<br />

t::<br />

and<br />

A l24-item draft version of the Personal Activities<br />

Inventory was developed, based on a review of the relevant<br />

literature, and on the nature of the critical tasks to be<br />

performed during the job performance test. Emphasis was<br />

placed on biodata factors associated with mechanical<br />

interests, abilities, and experience, numerical and<br />

technical/scientific interests and abilities, past<br />

experience with computers, and on work, academic, and<br />

personal experiences that might be reasonably expected, on<br />

an llarmchairtl basis, to be related to task performance.<br />

Attention was similarly given to the development of items<br />

that might reflect the cognitive (e.g., attention to detail)<br />

and social (e.g., working alone or with others) processes<br />

that might be reflected in task proficiency. The 124 items<br />

were classified into 24 broader Biodata Factors. The draft<br />

version of the Inventory was reviewed and, following minor<br />

refinements, was pretested on a small sample (N=15) to<br />

determine ease of administration. No problems were found.<br />

Subjects<br />

Subjects for the biodata testing were first-term FCs. The<br />

103 sailors who were scheduled to be administered the handson<br />

job performance measurement test were, thus, a sample of<br />

convenience for the present study. While predictor<br />

(biodata) scores were collected for all 103 subjects, both<br />

predictor and criterion (job performance) data were<br />

available for only 56 of the total sample tested, 25<br />

(44.61%) radar and 31 (55.4%) data processing.<br />

Administration of the "Personal Activities Inventory"<br />

The final version of the Inventory was administered at Dam<br />

Neck and San Diego. Subjects were logged-in, given the test<br />

booklet and answer sheet, and instructed to begin. There was<br />

no time limit.<br />

Analysis of the Data<br />

Hands-on test data were entered into the computer. The raw<br />

scores for each of the seven critical tasks were summed for<br />

each subject in the form of a standard score. ,Data were<br />

analyzed separately for the two subratings. The response<br />

format of each item was the determining factor in the<br />

305


analysis. For biodata items in which the response options<br />

represented a continuum, the biodata scores were related to<br />

the job proficiency.scores using the Pearson Product Moment<br />

Correlation. Items with dichotomous or discontinuous<br />

response options were analyzed using the Point Bi-Serial<br />

Correlation.<br />

Fe-Radar Operations Personnel<br />

RESULTS<br />

Table 1 shows that nine (9) Biodata Factors contained items -.<br />

which correlated at a statistically significant level with<br />

the job performance data of the radar personnel. It is<br />

interesting to note that four of the items fall within the<br />

Adjustment/Emotional Maturity Factor; an additional four<br />

items deal with some aspect of Technical/Scientific,<br />

Mechanical, or Numerical Factors. Overall, 13 separate<br />

items validated against the criterion data. Table 2<br />

presents the results of the Pearson Product Moment<br />

Correlations (continuous to continuous variables) for radar.<br />

The validity coefficients range from.322 (p c.05) to .575 (p<br />


I<br />

_-- - I


Table 6 presents the results of the Point Bi-Serial<br />

Correlations (dichotomous to continuous variables) for the<br />

data processing subrating. Note, again, that Item #61 and<br />

Item #75 each have two response foils that reach statistical<br />

significance. As a result, the 12 Biodata Factors that<br />

validated for data processing, contain 20 statistically<br />

significant validity coefficients.<br />

DISCUSSION<br />

The data from the present study, while based on relatively<br />

small samples and still needing cross-validation, suggest -.<br />

optimism. Among radar persons, 15 validity coefficients<br />

reached levels of statistical significance across 9 Biodata<br />

Factors. Among data processing people, 20 validity<br />

coefficients reached statistically significant levels.<br />

Moreover, the size of the coefficients are consistent with<br />

those reported in the literature for job proficiency in<br />

relation to biodata predictors (Mumford C Owens, 1987). In<br />

fact, the correlations are larger than those reported for<br />

other uses of biodata to predict military proficiency where<br />

only ratings, rankings, and archival data were used as the<br />

criterion (Barge & Hough, 1986).<br />

CONCLUBIONS<br />

A correlational analysis identified 15 items that may<br />

predict. The data suggest that a biodata test may be a<br />

useful surrogate for job proficiency tests. However, the<br />

limitations of the study, for example, the restricted sample<br />

size, make it essential that these findings be crossvalidated<br />

to confirm and establish the predictive factors.<br />

It is further recommended that the emergent l@profilel@ of the<br />

ratings be used to generate hypotheses about factors that<br />

may be predictive and thus lead to a higher proportion of<br />

discriminating items. Lastly, thought should be given to<br />

extending the biodata approach to other Navy ratings.<br />

REFERENCES<br />

Barge, B.N., and Hough, L.M. (June, 1986). Utility of<br />

biographical data for predicting job performance. In<br />

Leatta M. Hough (Ed.), Literature review: Utility of<br />

temperament, biodata, and interest assessment for<br />

predicting job performance. Alexandria, VA: U.S. Army<br />

Research Institute for the Behavioral and Social Sciences.<br />

Mumford, M.D., and Owens, W.A. (March, 1987). Methodology<br />

review: Principles, procedures, and findings in the<br />

application of background data measures. Applied<br />

Psychological Measurement.<br />

309


INTRODUCTION<br />

DEVELOPMENT OF EQUATIONS FOR PREDICTING<br />

TESTING IMPORTANCE OF TASKS<br />

Walter G. Albert<br />

William J. Phalen<br />

Air Force Human Resources Laboratory<br />

The Specialty Knowledge Test (SKT) is an important component<br />

of the Weighted Airman Promotion System (WAPS). SKTs are lOOitem<br />

multiple choice achievement tests designed to measure job<br />

knowledge in various Air Force Specialties (AFSs). They are<br />

written annually for each AFS by teams of four to eight subject<br />

matter experts (SMEs). The SMEs are senior NCOs in the AFS for<br />

which a particular test is being written. A psychologist<br />

experienced in test construction procedures is assigned to each<br />

team to serve as a group facilitator.<br />

A critical part of the test construction process for any SKT<br />

is the preparation of the test outline, which guides the SMEs in<br />

determining how many questions they should write for each<br />

knowledge or duty area of the AFS. The outline used in test<br />

construction is generated in one of two ways. For many years,<br />

the SMEs created their own outline, which is referred to as the<br />

Conventional Test Outline (CTO). Recently, an automated process<br />

has been used to develop outlines for some AFSs. With this<br />

process, the Automated Test Outline (ATO) is available for use<br />

when the test development team arrives. The AT0 is generated<br />

from information gathered from testing importance (TI) surveys,<br />

where senior NCOs are asked to rate the importance of each task<br />

as to whether the knowledge(s) required to perform it should be<br />

covered by the SKT.<br />

An important advantage of the AT0 procedure over the CT0<br />

procedure is the direct link established between important tasks<br />

performed by incumbents in the AFS and test questions which<br />

address the knowledges required to perform those tasks. The AT0<br />

process has been implemented in several AFSs, but currently it is<br />

regarded as an experimental procedure and is being evaluated<br />

against the CTO. This paper investigates whether information<br />

routinely collected from occupational surveys can be used to<br />

generate accurate TI values for each task. The resulting<br />

prediction equations could then be used to select tasks for<br />

inclusion in testing importance surveys of previously unsurveyed<br />

AFSs or to serve as a surrogate for TI, when a TI survey cannot<br />

be accomplished.<br />

OCCUPATIONAL SURVEYS<br />

An occupational inventory containing up to 2,000 task<br />

statements is administered to a large number of incumbents in<br />

each AFS. These tasks are grouped into seven to twenty duty<br />

areas. Each duty area is comprised of a group of tasks that form<br />

a major activity associated with the job specialty. Each<br />

310<br />

_ .


surveyed job incumbent is requested to estimate the relative<br />

amount of time that he/she spends in performing each task on a<br />

nine-point scale that ranges from "very small amount of time" to<br />

"very large amount of time." No response means that the<br />

incumbent does not perform the task. Each of these ratings is<br />

divided by the sum of the relative time spent values for all of<br />

the tasks in the inventory to get a percentage of time spent<br />

value for the incumbent on each task. From these responses, the<br />

following values are computed for each task: (a) the percentage<br />

of incumbents performing the task (PMP), (b) the percentage of<br />

time spent by incumbents performing the task (PTM), and (c) the<br />

average pay grade of incumbents performing the task (AG). -.<br />

Another survey containing the same task list as the<br />

occupational inventory is administered to a large sample of<br />

senior NCOs in each job specialty, who use a nine-point scale to<br />

estimate the difficulty in learning to perform each task<br />

successfully (TD) and the emphasis that should be given in formal<br />

training on each task for newly-hired employees (TE). Raters are<br />

asked to respond to all tasks they are familiar with, even if<br />

some of them are not part of their current job. The TD and TE<br />

values for each task are the means of the responses.<br />

CONVENTIONAL TEST OUTLINE DEVELOPMENT<br />

The CT0 is organized according to broad job knowledge areas.<br />

The test development teams spend one to two days to create CTOs<br />

by specifying and weighting knowledge categories based on their<br />

own expertise and their review of appropriate personnel<br />

classification and training documents, such as the Specialty<br />

Training Standard, which describes important duties and tasks for<br />

each job specialty; the Position Classification, which describes<br />

all duties and responsibilities for each job specialty; and the<br />

SKT abstract, which furnishes the following information for each<br />

task in the AFS: PMP, PTM, TE, AG, and TD. The SMEs decide on<br />

the number of test questions to be written on each knowledge<br />

area, based on their determination of the relative testing<br />

importance of that area.<br />

AUTOMATED TEST OUTLINE DEVELOPMENT<br />

The first step in the AT0 process is to select those tasks<br />

from the inventory that are performed by at least 50% of the<br />

incumbents or have TE values at least one standard deviation<br />

above the mean. The screening process selects approximately 150<br />

to 250 tasks for each AFS. A survey containing the selected<br />

tasks is administered to approximately 70 senior NCOs to obtain<br />

their opinions on the importance of including a question on the<br />

SKT concerning the knowledge required to successfully perform<br />

each task. The rating scale for testing importance is a sevenpoint<br />

scale that ranges from *Ino importance" to t'extremely high<br />

importance.t' The interrater reliability of these ratings is<br />

estimated and deviant raters are eliminated (Lindguist, 1953).<br />

The testing importance (TI) value for each task is the mean of<br />

the ratings after deviant raters have been eliminated.<br />

311


An AT0 is organized by duties and tasks within duties. All<br />

tasks on the TI survey are listed under the appropriate duty.<br />

The TI values are used to weight the duties and tasks. To<br />

accomplish this weighting, the TI values for each task are<br />

squared and summed within a duty. The weight for each duty is<br />

the sum of the squared TI values across all tasks within the duty<br />

divided by the sum of the squared TI values across all duties.<br />

These weights are the percentages of test questions to be<br />

selected to cover the required knowledges to successfully perform<br />

the tasks within each duty.<br />

The TI value of each task within a duty is reflected by a<br />

letter from A to D. Tasks are designated as r'AV' tasks if their -.<br />

TI values are at least one standard deviation above the mean of<br />

the TI values or if their TI values are at least 6.0. Similarly,<br />

tasks are designated as 'ID" tasks if their TI values are more<br />

than one standard deviation below the mean of the TI values:<br />

however, all tasks with TI values of at least 4.00 are designated<br />

as qVC'* tasks. Of the remaining tasks, the upper 50% are<br />

designated "B" tasks and the lower 50% are designated "C" tasks.<br />

SMEs are required to write at least one item to test the job<br />

knowledge required for every '@A" task and to write no more than<br />

three items for a single task. Procedures are available to<br />

override these restrictions; however, they require written<br />

justification. Items can be written on "D1@ tasks only with the<br />

group facilitator's approval.<br />

PROCEDURE<br />

Tasks for each of 26 AFSs for which testing importance<br />

indices were available (914X0, 753X0, 423X3, 791X0, 423X4, 792X1,<br />

915X0, 908X0, 392X0, 231X2, 542X2, 674X0, 552X0, 324X0, 542X1,<br />

427X3, 321XlE, 112X0, 121X0, 274X0, 321XlG, 241X0, 431X0, 275X0,<br />

566X0, and 231X0) were randomly divided into two samples--one<br />

sample designated the Validation sample" and the other sample<br />

designated the "cross-validation sample." First, the IIAtt tasks<br />

were randomly split between the two samples, such that each<br />

sample contained approximately an equal number of l*AVV tasks. The<br />

IIB, II IICII and I'D“ tasks were split between the two samples in the<br />

same manner. Regression equations were computed separately for<br />

each validation and cross-validation sample with TI as the<br />

criterion and PMP, PTM, AG, TD, and TE as the predictor<br />

variables. The two sets of regression weights computed for each<br />

AFS were applied to the predictor scores for the cross-validation<br />

sample to generate predicted testing importance (PTI) values.<br />

The predictive efficiency of each set of weights can be measured<br />

by the Pearson coefficient of correlation (r) between TI and PTI.<br />

If the shrinkage in r using the two sets of weights on the<br />

cross-validation sample is statistically nonsignificant (Walker t<br />

Lev, 1953), then the data for both samples can be combined for a<br />

hierarchical clustering analysis. In this procedure, the number<br />

Of regression equations is reduced by one at each stage of the<br />

clustering by combining AFSS into groups and combining their<br />

corresponding regression data. The two most similar groups are<br />

combined at each stage, as measured by the resulting loss of<br />

312<br />

,<br />

i


overall predictive efficiency, (i.e., the reduction in r between<br />

TI and PTI). The process continues until all data are combined<br />

into a single equation. Analysis of the r losses at each stage<br />

allows identification of the fewest number of regression<br />

equations that can accurately generate FTI values across all<br />

AFSs. In order to measure how well each set of weights would<br />

reproduce an ATO, the weights were used to classify tasks into<br />

the "A-D" categories. PTI values were classified into importance<br />

categories of A through D using a procedure identical to the one<br />

for TI values. Classification accuracy (CA) was measured by<br />

computing the table and formula shown in Figure 1.<br />

Predicted Ch0~ltlcatlon<br />

A B C D<br />

A F 11 F 11 F 1s F 14 Rl<br />

B<br />

Actual F 21 F 22 F 2s F 24 R2<br />

Cluoiflcatlon<br />

c F91 F SP F 93<br />

F R3<br />

94<br />

D F 41 F 44 F,, F 44 Fk<br />

Cl c2 cs c4 N<br />

F;; ir the frequency in the iJ%ll<br />

Figure 1. Classification Table and Formula<br />

CA has been weighted such that misclassifications result in<br />

larger penalties as the V'distance18 between predicted<br />

classification and correct classification becomes greater. This<br />

weighting strategy is reasonable, in that testing importance<br />

differences associated with categories in the table become<br />

greater as the 8*distance1V between the categories increases. The<br />

range of CA values is 0% (every classification has maximum<br />

distance from the correct classification) to 100% (every<br />

classification is correct).<br />

RESULTS<br />

The r's using weights from the cross-validation samples<br />

ranged from .44 (908X0) to .92 (914X0) and the r's using weights<br />

from the validation samples ranged from .42 (566X0) to .91<br />

(121X0). Therefore, there is great amount of variability among<br />

the AFSs in the ability of a linear function of the five<br />

predictors to account for the variance in TI. Because the<br />

shrinkage in r using the weights from the validation and crossvalidation<br />

samples was nonsignificant (a(=.O5) across all AFSs,<br />

the validation and cross-validation samples were combined for<br />

subsequent analyses.<br />

313<br />

_


classification tables and CA's were also computed for each<br />

set of weights. The CA's using weights from the cross-validation<br />

samples ranged from 70% (908X0) to 92% (121X0); and for the<br />

validation samples, from 68% (231X0) to 92% (121X0). Of the<br />

4,104 tasks classified within the 26 AFSs, only four "D" tasks<br />

were classified as "A" tasks and only four @'A" tasks were<br />

classified as @rD@t tasks. Although it is desirable to have zero A<br />

to D or D to A misclassifications (because the test development<br />

team is being advised incorrectly to write or not write an item),<br />

infrequent misclassifications of this type should not adversely<br />

affect the construction of a valid SKT. The team can rectify<br />

these discrepancies with the permission of the group facilitator. *:<br />

CA's computed for the combined data ranged from 71% (908X0) to<br />

90% (112X0). In general, the predictive accuracies using<br />

combined samples were higher than those for the validation<br />

samples referred to earlier, but all differences were small (less<br />

than 6%). Only two @ID" tasks were classified as rcA't tasks and<br />

two *'Aft tasks were classified as "D" tasks. Squared and<br />

interactive predictor terms were added to the model for each AFS<br />

in an attempt to increase classification accuracy, but only small<br />

increases in accuracy were observed. In fact, for some AFSs,<br />

classification accuracy decreased.<br />

What is adequate classification accuracy in the context of<br />

generating an ATO? The table having the lowest CA value (68%) is<br />

shown in Figure 2. It was generated by applying the 112X0<br />

weights from the validation sample to the cross-validation<br />

sample. The impact of the misclassifications in the table is<br />

probably not too severe when it is recalled that the AT0 is a<br />

guide for SMEs to use in developing an SXT, and they are free to<br />

select tasks from any of the importance categories within the<br />

restrictions delineated above.<br />

Pfedl,otd CtuJfloation<br />

Flgwo 2. Claultlcatlon Tsbk wlth Lorn.1 a Valu.<br />

The r's for the combined data for each AFS ranged from .51<br />

(908X0) to .91 (112X0). A hierarchical clustering of the<br />

regression equations for ail 26 AFSs showed small decreases in r<br />

throughout most of the clustering process. For example, the<br />

overall r dropped from .84 at the 26-group stage (i.e., a<br />

separate regression equation for each of the 26 AFSs) to .79 at<br />

the 5-group stage. Thereafter, the drops in r to the l-group<br />

Stage were .02, .02, .04, and .12, respectively.<br />

The gradual drop in r's until the clustering at the l-group<br />

stage makes identification of an "optimal clustering stage"<br />

difficult. Therefore, classification was also examined at<br />

various stages. The CA's of equations at the l-group stage<br />

314


anged from 62% (908X0) to 89% (112X0); however, the second<br />

smallest CA was 71% (321XlE). In comparison, the CA's at the 26group<br />

stage ranged from 71% (908X0) to 90% (112X0), with the<br />

second smallest CA being 72% (231X0). Therefore, the range of<br />

the CA's doesn't change much between the two extremes of the<br />

clustering process. At the 26-group stage, only 2 "A" tasks were<br />

classified as "Dw tasks and only 2 "D" tasks were classified as<br />

ltA1l tasks. With the exception of one AFS (908X0), where six 'ID"<br />

tasks were classified as IlA" tasks and one "A" task was classfied<br />

as a "D" task, there were only three "A" tasks classified as "D"<br />

tasks and two "DM tasks classified as "A" tasks over all AFSs at<br />

the l-group stage.<br />

A Wilcoxon matched-pairs signed-ranks test (Siegel, 1956)<br />

was used to compare the differences in CA's between the the 26group<br />

stage and the 5-group stage and between the 5-group stage<br />

and the l-group stage. There was a statistically significant<br />

difference (d=. 05) between the 26-group and 5-group stages, but<br />

not between the 5-group and l-group stages. Although<br />

significantly better classifications result from the use of 26<br />

eqUatiOnS, 20 Of 26 AFSs had differences Of 5% or 1eSS (maX=13%).<br />

If generalized equations are to be used to classify tasks in<br />

other AFSs where TI data are not available, it appears promising<br />

that a single prediction equation could generate adequate testing<br />

importance values. Further analyses are being conducted to<br />

identify the highest and lowest stages that are significantly<br />

different from the 26-group and l-group stages, respectively.<br />

CONCLUSIONS<br />

A large amount of the variance in TI was accounted for by<br />

linear combinations of the task-level predictors. The stability<br />

of least squares weights within each of the 26 AFSs was<br />

demonstrated. Prediction equations adequately classified tasks<br />

according to testing importance with very few A to D or D to A<br />

misclassifications. Use of squared and interactive predictor<br />

terms added little to predictive efficiency. A hierarchical<br />

clustering of the regression equations developed for each AFS<br />

showed small decreases in predictive efficiency throughout most<br />

of the clustering process. Preliminary results indicate that a<br />

single prediction equation may do an adequate job of classifying<br />

tasks on testing importance across all AFSs.<br />

REFERENCES<br />

Lindguist, E. F. (1953). Design and analysis of exneriments in<br />

psvcholoav and education. Boston: Houghton Mifflin Company.<br />

Walker, H. M. & LeV, J. (1953). Statistical inference. New York:<br />

Henry Halt and Company.<br />

Siegel, S. (1956). Nonnarametric statistics for the behavioral<br />

sciences. New York: McGraw-Hill.<br />

315<br />

_ .


Authors:<br />

INTRODUCTION<br />

ESTIMATING TESTING IMPORTANCE OF TASKS<br />

BY DIRECT TASK FACTOR WEIGHTING<br />

William J. Phalen, Air Force Human Resources Laboratory<br />

Walter G. Albert, Air Force Human Resources Laboratory<br />

Darryl K. Hand, Metrica, Inc.<br />

Martin J. Dittmar, Metrica, Inc.<br />

This paper is one of a series of presentations delivered at the current and previous two <strong>Military</strong> <strong>Testing</strong><br />

<strong>Association</strong> Conferences to document R&D of an automated, task-data-based outline development procedure<br />

for Air Force Specialty Knowledge Tests (SKTs). A companion paper to thii one (Albert & Phalen, 1990)<br />

provides a brief description of the automated test outline (ATO) procedure. This paper will focus on that part<br />

of the AT0 procedure having to do with the selection process by which 150 to 250 tasks are selected from a job<br />

inventory containing up to 2,000 tasks for inclusion in a <strong>Testing</strong> Importance. Survey booklet. Up to now, rulebased<br />

screening procedures have been used to identify potentially important tasks to include in the survey, with<br />

cutoffs on percent of members performing each task at the E-5 and E-6/7 paygrade levels and on the<br />

recommended training emphasis index being the primary selection criteria. A little over a year ago, research<br />

was initiated to derive and validate a minimal subset of regression equations for predicting the SME-furnished<br />

testing importance ratings in 28 AFSs with linear combinations of five task-level predictor variables, i.e., percent<br />

of members performing (PMP), percent time spent by members performing (PTM), ave.rage paygrade of<br />

members performing (AG), task learning difliculty (TD), and field-recommended task training emphasis for fusttermers<br />

(TE). So far, it appears that possibly one, but not more than three, generalized regression equations<br />

may adequately classify tasks into their appropriate testing importance categories. These equations will,<br />

hopefully, perform several important functions. First of all, they should provide a more accurate and defensible<br />

task selection procedure for surveying AFSs that have not been previously surveyed. Secondly, the predicted<br />

testing importance (PTI) values generated by the equations should be able to serve as surrogate testing<br />

importance indices when time or budget constraints prevent the administration of testing importance surveys.<br />

Thirdly, when a new job inventory is developed and administered in an AFS whose testing importance data are<br />

based on the old job inventory tasks, the new data for the predictor variables should be available to use in<br />

conjunction with one of the generalized regression equations to generate PTI values for all the tasks in the new<br />

job inventory.<br />

But the application of these PTI equations also raises several pertinent questions: (1) How can we<br />

determine which PTI equation should be used to generate PTI values for a previously unsurveyed AFS? (2) Can<br />

SMEs provide direct estimates of Al?%specific weights for the five predictor variables that are nearly as accurate<br />

for an AFS as the generalized regression weights? (3) Is it possible that the need for regression-generated or<br />

SME-derived weighting is obviated by simple unit weighting of the five predictor variables? The potential value<br />

of direct estimation of predictor weights by SMEs was anticipated back in 1987; accordingly, an SKT Task Factor<br />

<strong>Testing</strong> Importance Survey booklet was developed and administered to the SMEs in all AFSs for which SKTs<br />

were developed in 1988, 1989, and 1990 (to date). The booklet used in 1988 contained seven factors, the two<br />

additional ones being “consequences of inadequate performance” (CIP) and “requirement for prompt<br />

performance” (RPP), the latter being a rewording of the old “task delay tolerance” factor in order to reverse the<br />

direction of the scale and make it consistent for all factors. In 1989, it was decided to limit the task factors<br />

surveyed to the five which were routinely surveyed by the USAF Occupational Measurement Squadron<br />

(USAFOMS); thus, CIP and RPP were dropped. The elimination of the CIP and RPP factors also made it<br />

possible to assess the effect of their presence or absence on the other five factors. In 1990, the CIP and RPP<br />

factors were restored to the survey in order to introduce more variance into the profiles of the SME-furnished<br />

factor weights and thus eliminate some fuzziness from the clustering solution. The availability of data on the<br />

same seven factors for the same AFSs in 1988 and 1990 made it possible to assess the stability of factor weights<br />

over a two-year period, assuming, of course, that the SMEs in both periods were equally representative of their<br />

Al=%.<br />

316<br />

- .<br />

I<br />

I


THE SURVEY INSTRUMENT<br />

The SKT Task Factor <strong>Testing</strong> Importance Survey is administered to all SMEs who have been sent by<br />

their respective commands to participate in the development of SKTs in their AFSs. TO date, approximately<br />

1,000 SMEs have been surveyed. The survey is group-administered by a member of the USAFOMS test<br />

development staff immediately following the SKT in-briefing. It takes about 10 minutes to read the instructions,<br />

fill in the background section, and provide ratings on the seven listed factors (1 to 7 scale). In order to clearly<br />

communicate what the XT task factor rating process is all about, the rating instructions, scale, and factor<br />

definitions as they appear in the survey booklet are shown in Figure 1.<br />

RESULTS<br />

A. Reliabilitv Analvsis. There were 35 AFSs in which the SKI Task Factor <strong>Testing</strong> Importance Survey<br />

was administered in 1988 and again in 1990. In most instances, no SMEs appeared in both survey<br />

samples. As shown in Table 1, the average number of raters per AFS in 1988 was 3.50, and in 1990,<br />

the average was 3.59. The average correlation between the mean factor profiles (across seven<br />

factors) for the 35 AFSs was 4841 (correlations averaged through 2,). A value this high was<br />

considered very acceptable, especially since it involved a two-year time interval between<br />

administrations and small numbers of different raters per AFS at both points in time. This value<br />

compares very well with the average test-retest reliability of 5835 that was obtained on task-level<br />

testing importance ratings for 26 raters in 20 AFSs with a 3-to-4-month interval (Weissmuller,<br />

Dittmar, & Phalen, 1988). These raters were surveyed by mail and were later surveyed again when<br />

they were selected to serve on an SKT development team. The difference between the two<br />

reliability coefficients was found to be nonsignificant (p = .4337). As a further test, the 1988-to-1990<br />

factor profile correlation ( i = 4841) was treated as a group measure of interrater reliability (RJ<br />

with no time interval involved, and the R, was reduced to a single-rater reliability value (R,,) for<br />

comparison with the mean R,, value for task-level testing importance ratings across all 28 AFSs that<br />

had been surveyed. The computed R,, value for a composite reliability (Ra of 4841 based on an<br />

overall average of 3.54 raters per factor profile was 2649. The average R,, for the task-level testing<br />

importance ratings across the 28 surveyed Al% was Z40, an almost identical value. Yet, the<br />

former involved a two-year interval and the latter is a concurrent measure of internal consistency,<br />

B. 1. Two tests were<br />

applied to determine whether the relative weights of the common five factors were affected by<br />

adding or removing the additional two factors (i.e., CIP and RPP). In the first test, each factor was<br />

given an overall rank in terms of its mean rating in 1989 (live-factor survey) and its mean rating in<br />

1988 and 1990 separately (seven-factor surveys). The Mann-Whitney test was applied to assess the<br />

differences in the sums of ranks. The mean ratings of the PTS, AG, and TD factors were relatively<br />

unaffected by the presence or absence of the additional factors, but PMP and TE showed signilicant<br />

shifts in their mean ratings (p c .Ol). Both were significantly higher when CIP and RPP were<br />

absent (or sign&a.ntly lower when CIP and RPP were present). A test was also applied to<br />

determine whether the sizes of the differences between the PMP and TE means in the five-factor<br />

vs. the seven-factor environment ‘were related to the sizes of the mean CIP and RPP values.<br />

Regression equations of the form i%@ - PMP, = W, m, + W, RPP, were applied. None of the<br />

regression results were found to be significant. Thus, while it can be said that PMP and TE were<br />

affected in a given direction by the presence or absence of CIP and RPP, there was no indication<br />

that the level of difference was proportional to the level of CIp and m.<br />

317<br />

.


SECTION II. INST’RUCI-IONS<br />

Imagine that you have been asked to review the job-task statements in the most went USAF Job Invcntoy administered<br />

in the career field for which you are developing SKT’s. This survey could contain anywhtre from SO0 to 1200 or more task<br />

statements. Next, assume that you have been asked to rate each task statement indicating how important it is to include the job<br />

knowledges needed to petiorm that task on a Specialty Knowledge Test. A task would be rated high in testing importance if it<br />

requires knowledges that are critical to successful job performance within the career field.<br />

You are in luck, however. You are not being asked to prwidc these 500 or more ratings. Instead, seven factors (or.typcs<br />

of information) have been proposed as possible factors in determining tht testing importance level of a task. These seven factors,<br />

along with thtir descriptions, are shown in Section II, SKT TASK FACIGR TESTING IMPORTANCE RATING SCALE YOU<br />

are asked to rate each task factor on how impottant it is to consider this factor when assigning a testing importance rating to the<br />

tasks performed by airmen in the Air Force Specialty for which you are developing SK%. Using the scale provided, determine the<br />

most appropriate rating and record your rating in the column provided.<br />

Ratine Factor<br />

SECTION II: SKI- TASK FACIOR TESTING IMPORTANCE RATINGS<br />

RATING SCALE FOR FACTORS IN TESTING IMPORTANCE<br />

This factor has:<br />

7 = Extremely High Importance<br />

6 = High Importance<br />

5 = Above Average Importance<br />

4 = Average Importance<br />

3 = Below Average Importance<br />

2 = Iow Importance<br />

1 = No Importance<br />

- 1. Percent Members Performing: a measure of the proportion of all airmen who perform the task<br />

- 2. Average Percent Time Soent: a measure of the proportion of the total work time that airmen in the AFS spend<br />

performing the task<br />

- 3. Average Grade: the average grade of all airmen who perform the task.<br />

- 4. Learning Difficult\l: a measure of the relative length of time required to learn to perform the task properly.<br />

- 5. Ccmseaucnces of Inadeauate Performance: a measure of the probable seriousness of failing to perform the task<br />

properly. Tht impact is measured in terms of possible injury or death, damage to equipment, wasted supplies or lost<br />

work-hours, etc.<br />

- 6. Reauiremtnt for Promot Performance: a measure of the length of time from the moment that an airman is aware<br />

that a task will need to be done up to the point at which the task MUST be performed. In other words, doer the<br />

airman have to be able to perform the task immediately, or does he or she have time to consult a manual or seek<br />

guidance?<br />

__ 7. Field-Recommended I&t-y-Level Training Fmoh&s: a measure of how strongly NCOs in the field have<br />

recommended the task for inclusion in formal, structured training programs for entry-level airmen. Structured<br />

training may include resident technical school, on-the-job training (OJT), field training detachments (FTDs), or<br />

career development courses (CDCs).<br />

Figure 1. SKT Rating Form<br />

318


C. Clusterine of Factor Profiles vs. Clustering of PTI Remession Equations. One objective of<br />

gathering task factor ratings from SMEs was to provide a means of determining which one of<br />

several generalized regression equations should be applied to previously unsurveyed AFSs to<br />

select the appropriate set of tasks for inclusion in a Task <strong>Testing</strong> Importance Survey. If AFS<br />

factor profiles produced a clustering of AFSS that corresponded to the clustering of AFSs on<br />

similarity of regression equations, then regression equation group membership could be defined<br />

for task factor clusters of AFSs for which there were no regression equations. Various attempts<br />

were made to produce corresponding clustering solutions, but no adequate match could be<br />

generated. A major impediment was the fact that even in the case in which the input sample<br />

of factor profiles contained the maximum amount of variance (1988,1989, and 1990 combined)<br />

the “between” overlap for the last two groups to merge was 86.3% and the total sample “within”<br />

overlap was 93.2%. On the other hand, the clustering of regression equations did not seem to<br />

indicate a need for more than one equation. Thus, a lack of variance was present in these data,<br />

as well. If additional research indicates that only one overall regression equation is needed for<br />

all AFSs, then the need for a procedure to select the appropriate regression equaiion for a<br />

previously unsurveyed AFS vanishes.<br />

D. Comnarison of Remession- vs. Factor-Weighted Eauations for Predictinp Testine: Imnortance<br />

of Tasks. Table 1 shows the predictive efficiency of the AFS-specific PTI regression equations<br />

for 25 AFSS for which task-level testing importance id&s were available and for which SMEs<br />

had provided factor weights in 1988, 1989, or 1990. Since the derivation and validation of the<br />

regression equations and their predictive efficiency are discussed in detail in a companion paper<br />

(Albert & Phalen, 1990), the correlations of predicted and actual testing importance values for<br />

the 25 AFSs are reported here only for their comparison with the correlations produced by the<br />

SME-based factor-weighting approach (which standardizes each task factor before applying the<br />

factor weights and sums the cross-products into a testing importance composite). In Table 1,<br />

only the highest correlations computed for the 1988, 1989, and 1990 factor weights and all<br />

possible combinations thereof are reported in order to show the highest correlations this<br />

approach can hope to produce for comparison against the best alternative, i.e,. the least-squares<br />

fit of task-level indices for the five task factors (predictors) to the indices of task-level testing<br />

importance (criterion).<br />

For some unexplainable reason, the 1990 factor weights uniformly produced<br />

lower correlations than the 1988 weights. Overah, the factor-derived correlations averaged to<br />

a respectable i = 602 at the E-5 level and i = 606 at the E-6/7 level, compared to i = .798<br />

and .786 for the E-5 and E-6/7 regression-derived correlations, respectively. The difference is<br />

significant (p c 01) in both cases, but the real difference is in the lack of uniformity of fit of<br />

the factor-derived approach; i.e., in some cases, it matches the regression-derived correlations<br />

quite well, and in other cases rather poorly. It appears that the SME-furnished factor-weighting<br />

approach is not an acceptable alternative to the regression approach, as long as the regression<br />

alternative remains supportable,<br />

E. Differential vs. Unit Weiphting of Factors. Because there was little variance in the SMEderived<br />

factor weights, and substantial positive correlations existed between the five task factors<br />

and the testing importance criterion, with the exception of average grade (Weissmuller, Dittmar,<br />

& Phalen, 1989), there was a distinct possibility that a unit-weighted linear composite of the<br />

standardized task factors might do almost as well as the differentially weighted composite. The<br />

effect of unit weighting on the correlations with testing importance are shown in Table 1 under<br />

the heading “Unit.” The unit weighting approach produced correlations for both the E-5 and<br />

E-6/7 levels that were generally close to the correlations derived from differential weighting by<br />

SMEs, with only two instances showing a substantial drop in correlation (both within the same<br />

AFS); but 14 correlations based on unit weighting were actually higher than those based on<br />

differential weighting. Tests of significance of difference between the i ‘s for differential and<br />

319


=TlwilhlTl<br />

r-TlrithPll


DJSCUSSION<br />

unit weighting at the E-5 and E-6/7 levels (602 vs. 565, and 606 vs. S82, respectively) yielded<br />

no significant differences. These findings clearly indicate that there is virtually nothing to be<br />

gained by continuing to gather factor importance ratings from SMEs, since. unit weighting of<br />

the factors is equally effective.<br />

The findings of this study suggest one positive conclusion and three negative conclusions. The positive<br />

conclusion is: (1) Factor importance weights display good reliability, even when the interval between<br />

administrations is as long as two years. The negative conclusions are: (1) The factor importance weighting<br />

approach does not yield correlations with task-level testing importance that would permit abandonment of the<br />

more rigorous regression approach, which requires the administration of task-level testing importance surveys<br />

in order to obtain criterion data for generating a least-squares solution. (2) There does not appear to be<br />

sufficient variance in the profiles of factor weights to provide a clustering of AFSs that corresponds sufficiently<br />

well with the clustering of AI%-specific regression equations; therefore, the clustering of profdes of factor weights<br />

is not useful for indicating which generalized regression equation should be used for a particular AFS (assuming<br />

that more than one equation will be needed to adequately cover ah AFSs). (3) Since unit weighting of the<br />

testing importance factors in virtually as good as SME-furnished differential weights, there is little to be gained<br />

by continuing to gather factor importance ratings from SMEs.<br />

RECOMMENDATIONS<br />

Discontinue administration of the <strong>Testing</strong> Importance Factors Survey and concentrate instead on<br />

improving the predictive efficiency and classification accuracy of the regression-based procedure.<br />

REFERENCES<br />

Albert, W.G., & PhaIen, WJ. (1990). Development of equations for predicting testing importance of<br />

tasks. Proceedings of the 32nd Annual Conference of the Militarv TestinP <strong>Association</strong>,<br />

Orange Beach, AL.<br />

WeissmulIer, J J., Dittmar, M J. & Phalen, WJ. (1989). Automated test outline develonment: research<br />

findins (AFHRL-TP-88-70, AD-215 401). Brooks AFB, TX: Manpower and Personnel<br />

Division, Air Force Human Resources Laboratory.<br />

321<br />

.


Upper Body Strength and Performance in Army Enlisted MOS<br />

Elizabeth J. Brady and Michael G. Rumsey<br />

Army Research Institute<br />

Introduction<br />

Cognitive testing for selection and classification purposes<br />

has a long and distinguished history in the military services.<br />

The link between cognitive ability and soldier performance has by<br />

now been firmly established, providing a reasonably solid basis<br />

for this type of testing.<br />

The concept of screening on the basis of physical strength<br />

capability is less firmly established. A solid empirical<br />

foundation linking physical strength to overall job performance<br />

does not as yet exis:. Yet for those jobs requiring lifting or<br />

moving heavv n:,ysic?l objects, the question naturally arises as<br />

to whether SC"'- llllllirnal degree of physical strength might be an<br />

appropriate prerequisite.<br />

This question began to receive special attention in the<br />

1970's, as the number of women serving in the military, as well<br />

as the number of specialties open to women, increased<br />

dramatically. In 1976, the General Accounting Office recommended<br />

that the services develop common physical standards for males and<br />

females in specialties where physical strength attributes were<br />

relevant to effective performance. In 1982, A Women in the Army<br />

policy review evaluated the strength requirements of a variety of<br />

jobs. Then, 1984, the Army began administering the <strong>Military</strong><br />

Entrance Physical Strength Capacity Test ('MEPSCAT) to each<br />

applicant for enlistment at the <strong>Military</strong> Entrance Processing<br />

Stations (MEPS). Results of the test were used for job placement<br />

counseling rather than for determining an individual's<br />

qualification for entering any particular job.<br />

In 1987, the Army's personnel office, the ~. Office-of _. the ,,<br />

Deputy Chief of Staff for Personnel (ODCSPER), determinea tnat it<br />

was time to review its physical strength screening process. The<br />

question of most immediate concern was: are the benefits of<br />

screening worth the effort? The initial approach taken to<br />

answerina the o-uestion was to explore whether there was any<br />

evidencedthat physical strength iimitations were perceived to<br />

interfere in any substantial way with job performance in the<br />

Army.<br />

Presented at the meeting of the <strong>Military</strong> <strong>Testing</strong><br />

<strong>Association</strong>, November, 1990. All statements expressed in this<br />

Paper are those of the authors and do not necessarily reflect the<br />

official opinions or policies of the U.S. Army Research Institute<br />

or the Department of the Army.<br />

322 I


The ODCSPER directed that a Physical Requirements<br />

Questionnaire (PRQ) be developed and administered to determine<br />

the extent to which job incumbents were perceived, by themselves<br />

or their supervisors, as having difficulty in performing their<br />

job due to upper body strength limitations. Accordingly, the<br />

U.S. Army Research Institute, in collaboration with the Enlisted<br />

Accessions Division of the ODCSPER and the Exercise Physiology<br />

Division of the Army Research Institute of Environmental<br />

Medicine, developed a 7-item supervisor version and an ll-item<br />

incumbent version of this questionnaire. Only the results from<br />

the incumbent version will be discussed in this paper.<br />

This paper will assess the extent to which insufficient.<br />

upper body strength is perceived to interfere significantly with<br />

job performance in a representative sample of Army jobs. These<br />

self-report data will also be related to MEPSCAT scores, an<br />

objective measure of upper body strength.<br />

Method<br />

Subjects. The total sample size consisted of 11,069 (88%<br />

male, 12% female) job incumbents across 21 <strong>Military</strong> Occupational<br />

Specialties (MOS). There were 65% white, 27% black, 4% hispanic,<br />

and 4% other in this sample. The mean age for 86% of the males<br />

was 20, and 60% of the females had a mean age of 21. Due to<br />

missing data, the actual sample sizes used in the following<br />

a.lalyses may be somewhat smaller.<br />

Phvsical Requirements Ouestionnaire. The incumbent version<br />

of the PRQ contains 11 items, which consist of 10 multiple choice<br />

and one short answer. This version was pretested in April 1988,<br />

as part of a field test of Project A second tour measures. It<br />

was administered to 79 second tour soldiers (36 to 60 months in<br />

service) in three MOS (13B, cannon crewmember; 88M, motor<br />

transport operator; and 95B, military police). The results of<br />

the pretest indicated that the PRQ was easy to administer, that<br />

the response options were reasonable, and that it could be<br />

completed in less than 10 minutes.<br />

Phvsical Demand Cateqories. The purpose of the physical<br />

demand categories is to assign soldiers to jobs for which they<br />

are physically qualified. The categories are based on upper body<br />

strength. According to AR 611-201, the five categories are: (1)<br />

LIGHT - occasionally lift 20 pounds and frequently lift 10<br />

pounds: (2) MEDIUM - occasionally lift 50 pounds and frequently<br />

lift 25 pounds; (3) MODERATELY HEAVY - occasionally lift 80<br />

pounds and frequently lift 40 pounds; (4) HEAVY - occasionally<br />

lift a maximum of 100 pounds and ,frequently lift 50 pounds; and<br />

(5) VERY HEAVY - occasionally lift over 100 pounds and<br />

frequently lift 50 pounds. As shown in Table 1, the Project A<br />

sample has 14 Very Heavy MOS, 1 Heavy MOS, 4 Moderately Heavy<br />

MOS, 2 Medium MOS, and no Light MOS.<br />

323<br />

c<br />

. .


Table 1<br />

MOS by Physical Demand Cateaories<br />

VERY HEAVY 11B Infantryman<br />

12B Combat Engineer<br />

13B Cannon Crewmember<br />

19E M48-M60 Armor Crewman<br />

19K Ml Armor Crewman<br />

27E Tow/Dragon Repairer<br />

31c Single Channel Radio Operator<br />

51B Carpentry & Masonry Specialist<br />

54B Chemical Operations Specialist<br />

55B Ammunition Specialist<br />

63B Light Wheel Vehicle Mechanic<br />

67N Utility Helicopter Repairer<br />

88M Motor Transport Operator<br />

94B Food Service Specialist<br />

HEAVY 76Y Unit Supply Specialist<br />

MODERATELY HEAVY<br />

MEDIUM<br />

16s Manpads & PMS Crewmember<br />

29E Radio Repairer<br />

91A Medical Specialist<br />

95B <strong>Military</strong> Police<br />

71L Administrative Specialist<br />

96B Intelligence Analyst<br />

Data Collection. The objective was to collect questionnaire<br />

responses from a large number of first tour incumbents in a<br />

reasonably representative set of Army MOS, or jobs. It was<br />

determined that the most effective means of achieving this<br />

objective was to administer the PRQ as part of a large-scale data<br />

collection being conducted as one stage in a research effort,<br />

known as Project A, to improve the Army's enlisted selection and<br />

classification system. Between July, 1988 and February, 1989,<br />

the PRQ was administered to 11,069 soldiers in 21 MOS chosen to<br />

reasonably represent the full set of Army MOS for Project A<br />

purposes.<br />

Results<br />

A factor analysis with an orthogonal varimax rotation<br />

yielded two factors, which accounted for 46% of the common<br />

variance. The first factor includes items which deal with the<br />

individual's inability to get the job done: the second factor<br />

includes items which tend to focus more on ways in which to<br />

improve job performance.<br />

324<br />

.


Factor 1,<br />

For purposes of this paper, one representative item was<br />

selected from each scale for purposes of highlighting some of the<br />

principal results that emerged from our initial analyses of these<br />

data. From the first factor, the item selected reads as follows:<br />

How many times in the past six months have you had insufficient<br />

upper body strength to complete a task assignment in your MOS?<br />

The response options for question 1, and the proportion of<br />

respondents choosing each option, are shown below:<br />

Prooortion<br />

Ontion Male Female Total<br />

1. 10 or more 7 8 7<br />

2. 5 to 9 3 5 3<br />

3. 2 to 4 8 17 9<br />

4. 1 6 6 6<br />

5. None 76 64 75<br />

Before further analyses were conducted, response options<br />

were grouped into two categories based on the degree of<br />

difficulty experienced by the respondent in performing tasks:<br />

high difficulty (options 1 and 2) and low difficulty (responses<br />

3, 4 and 5). Thus, 10% of the total group and of the males, and<br />

13% of the females, fell in the high difficulty group.<br />

The next analysis examined whether this type of difficulty<br />

was related to ability to lift as measured by the MEPSCAT score<br />

obtained at the time of enlistment. Individuals were sorted irrto<br />

two groups based on their MEPSCAT score: one group consisting of<br />

those who were able to lift 110 pounds, and a second group<br />

consisting of those who were not. The difference between the<br />

groups was rather small: 9.7% of those with high MEPSCAT scores<br />

reported high difficulty; 11.5% of those with low MEPSCAT scores<br />

reported such difficulty.<br />

Next, results were compared across MOS. Substantial<br />

differences were found, with motor transport operators having the<br />

largest percentage (16) in the high difficulty group and radio<br />

repairers having the lowest percentage (3).<br />

The next set of analyses examined characteristics which<br />

might at least in part account for MOS differences. It was fo.und<br />

that 11% of the soldiers in MOS with very heavy physical strength<br />

requirements, compared with 7% in the other MOS, fell in the high<br />

difficulty category. In combat MOS, 12% experienced a high<br />

degree of difficulty: in non-combat MOS, 8%.<br />

Some of the results followed no particular pattern,<br />

suggesting the need for further investigation. The greatest<br />

disparity between the sexes was found among light wheel vehicle<br />

mechanics, where 26% of the females, but only 9% of the males,<br />

325


eported high difficulty. While a fair number of males (14%)<br />

experienced difficulty in the motor vehicle transport job, this<br />

was another case where the percentage of females experiencing<br />

difficulty was particularly high (24%). Both male (13%) and<br />

female (18%) food service specialists also placed large numbers<br />

in the high difficulty category.<br />

The pattern oi results for other items in the first factor<br />

generally followed the pattern for this item. However, MEPSCAT<br />

made a much greater difference with respect to a second item in<br />

this factor: How many times in the past six months have you been<br />

physically unable to lift an object while working on your Army<br />

job? The response options for this item were the same as those<br />

for the first item, and high difficulty and low difficulty were<br />

defined the same way as for the first item. On this item, 5.3%<br />

of those with high MEPSCAT scores reported high difficulty; 9.8%<br />

with low MEPSCAT scores so reported.<br />

Factor 2<br />

The item in Factor 2 chosen for close examination in this<br />

paper read as follows: How helpful do you think weight/strength<br />

training would be in improving your job performance? The<br />

response options for this item, and the proportion of respondents<br />

choosing each option, are shown below:<br />

Proportion<br />

Option Male Female Total<br />

1. Extremely helpful 31 18 30<br />

2. Helpful 30 26 29<br />

3. Somewhat helpful 18 20 18<br />

4. A little helpful 13 18 13<br />

5. Not at all helpful 8 18 10<br />

Again, for purposes of simplicity, responses were grouped<br />

into two categories. A "more helpfull' category consisted of<br />

options 1 and 2; a lVless helpful" category consisted of options<br />

3, 4, and 5. As can be seen above, 59% of the total group, 61%<br />

of the males, and 44% of the females, responded in the more<br />

helpful category.<br />

Among those able to lift 110 pounds, 61% were in the more<br />

helpful category, as opposed to 53% of those not able to lift 110<br />

pounds.<br />

A comparison across MOS revealed vast differences. Among<br />

cannon crewmembers, 73% were in the "more helpful" Category.<br />

Among administrative specialists, 25% were in this category. In<br />

the MOS with very heavy strength requirements, 64% thought<br />

weight/strength training would be helpful or extremely helpful.<br />

In the other MOS, the percentage was only 49%. In combat MOS,<br />

the percentage was 70%: in non-combat MOS, 53%.<br />

326<br />

. .


Discussion<br />

Certain characteristics of this effort suggest that we<br />

should treat these findings cautiously. We are dealing with<br />

self-report; thus, all limitations associated with self report<br />

measures must be considered. We have observed a positive<br />

relationship between self-reported difficulty in lifting an<br />

object and performance on a more objective measure, the MEPSCAT,<br />

however, so we do feel the results do deserve to be looked at<br />

seriously. We should also point out that we are dealing here<br />

with but two items on an Il-item scale. Until we can report more<br />

thoroughly the results of all the items, as well as the results<br />

from the supervisor version of the PRQ and from a variety of.<br />

additional performance measures administered concurrently with<br />

the PRQ, these results should be considered as just a slice from<br />

a much larger picture.<br />

Having expressed these caveats, what should we make of the<br />

results? The good news is that soldiers do not report widespread<br />

difficulties with the physical demands of their jobs. The<br />

somewhat surprising news is that the overall differences between<br />

self-reported male and female difficulty are not particularly<br />

great.<br />

But when we look beneath the surface, the picture is not ali<br />

that simple. There are major job differences, some not terribly<br />

surprising, some perhaps deserving further investigation. Why<br />

are the physical demands of being a mechanic, for example,<br />

apparently so much greater for females than for males? Why is<br />

there a similar disparity for truck drivers?<br />

The item on weight/strength training also revealed some<br />

interesting news. It is those people who are already strongest<br />

(in terms of their MEPSCAT scores) who are most convinced of the<br />

benefits of weight/strength training. Of course, since these<br />

individuals may be concentrated in jobs where the physical<br />

demands are the greatest, the true meaning of this finding awaits<br />

further analysis. While it may not be surprising that clerks see<br />

less need for strength training than do those in combat jobs, the<br />

extent to which clerks seem to view strength training as not<br />

particularly helpful is perhaps beyond what one might expect.<br />

The results reported here are best considered as a preview<br />

of things to come. Further analyses on a data set allowing a<br />

much broader set of-comparisons, and at a higher level of<br />

sophistication, than could be completed at this time will follow.<br />

Thus, we will forego the temptation to draw major conclusions<br />

until we have travelled somewhat further along the data analysis<br />

road.<br />

327


Response Distortion on the Adaptability Screening Profile (ASP)’<br />

Dale R. Palmer, Leonard A. White, and Mark C. Young<br />

U. S. Army Research Institute<br />

Alexandria, VA<br />

INTRODUCTION<br />

The Armed Services are considering the implementation of a biodata/temperament instrument,<br />

the Adaptability Screening Profile (ASP), to supplement education credentials as a predictor of first term<br />

attrition. A key problem in utilizing instruments like Ihe ASP, especially in the “en masse” screening<br />

medium of the Armed Services, concerns the potential for ttem response distortion of the self-report<br />

information, and consequently, invalidation of the instrument over time (Walker, 1985). Previous research<br />

on the Armed Services Applicant Profile (ASAP) and the Assessment of Background and Life<br />

Experiences (ABLE), both components of the ASP, indicates that these instruments are susceptible t0<br />

intentional distortion in the desired iirection of the examinee (Hough, 1987; Trent, Atwater, & Abraham%<br />

1986). Thus, it is possib!z that wid6,pread distortion could occur in a service applicant setting,<br />

particularly if ..lz\e,notlnc 5 f--Juraged. Guidelines may be written that “coach” applicants on how to<br />

do well on the test, ana recruiters, in order to meet quotas, might encourage or even train applicants to<br />

respond in a particular manner (Hanson, Hallam, & Hough, 1989).<br />

Prior to the research presented in this paper, we used a sample of 324 receptees to conduct a<br />

preliminary analysis of the effects of coaching on the ASP. With the assistance of military personnel, we<br />

developed a short script intended to represent “realistic” coaching that might be given to an applicant.<br />

The coaching taught examinees how to describe themselves in order to score well on the test. They<br />

were also warned that the instrument contained items to detect socially desirable responding and<br />

therefore not to answer in ways that could not possibly be true. As expected, we found that examinees<br />

can, when asked, distort their responses to the ASP in a socially desirable direction. Unexpectedly,<br />

however, the scores of examinees who were coached and warned about faking did not differ significantly<br />

from those who were responding honestly. One explanation for this result is that the warning effectively<br />

counteracted the coaching.<br />

The research reported here was designed to replicate and extend these findings. Specifically, to<br />

separate the effects of coaching and warnings about detection, one group received coaching on<br />

“correct” responding without being warned about possible detection and a second coached group was<br />

warned about faking detection. In addition, we examined the usefulness of the ABLE’s Validity scale to<br />

correctly detect those respondents who were instructed or coached to distort their responses in a<br />

socially desirable direction.<br />

Subiects<br />

METHOD<br />

Five-hundred and two male receptees were administered the ASP at the U.S. Army Reception<br />

Battalion, Ft. Sill, OK. ihe receptees were tested in eight groups of 14-105. Participants were informed<br />

that the purpose of the research was to learn how different test-taking strategies affect scores on the<br />

ASP.<br />

t Presenfed at the meeting of the <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong>, November, 1990. All statements expressed in this Paper are<br />

those of the authors and do not necessarily reflect the official opinions or policies of the U.S. Army Research Institute<br />

or the Department of the Army.<br />

328


Instruments<br />

-.~--..-~---.- --... .___ -- .__....<br />

The ASP is a combination of the ASAP and ABLE. The ASAP consists of 50 multiple choice<br />

items which are combined to yield an overall score. Responses to each item are scored l-3. with<br />

scoring weights to best predict attrition during the first term of enlistment. The ABLE is a 70-item,<br />

construct-based temperament scale comprised of three subscales to measure Achievement, Adjustment,<br />

and Dependability. These three subscale scores are combined with unit weights to form an overall ABLE<br />

composite. A fourth, the ABLE Validity scale, is used to detect inaccuracy in examinees’ responses<br />

caused by attempts to respond in a socially desirable manner.<br />

Procedure<br />

The design was a 4 x 2 between-subjects factorial with four l&els of instructional condition and<br />

two orders of test administration. One-half of the subjects within each session completed the ABLE prior<br />

to the ASAP and one-half first took the ASAP followed by the ABLE. The four instructional conditions<br />

were as follows:<br />

The Honest. instructions followed those developed for the proposed operational ASP.<br />

Participants were instructed to “pick the response that best describes your attitudes or past<br />

experiences.’<br />

Fake Good. Subjects in this condition were told to “se!ect the answer that describes yourself in<br />

a way that you think will make sure that the Army selects you....Your response should be the choice that<br />

you think would impress the Army the most.”<br />

Coached-With Warning. The instructions in this condition were designed to represent coaching<br />

strategies that might be used to help applicants for the Armed Services score well. Subjects were told<br />

to, “select the answer that describes yourself in a way that you think will make sure that Ihe Army selects<br />

you...to make a good impression,.,answer so that you look mature, responsible, well-adjusted, hardworking,<br />

and easy to get along with.” In addition, subjects were told to “be aware that there are<br />

questions designed lo detect if you are trying to make yourself look too good. So, answer in a way that<br />

makes you look good, but try to avoid answering any of the questions in a way that cannot possibly be<br />

true.”<br />

Coached-Without Warninq. Subjects in this condition received the same coaching instruction as<br />

those in the coached-with warning group, except that no warning about items to detect faking was<br />

provided.<br />

Descriptive Statistics<br />

RESULTS<br />

Table 1 presents the means and standard deviations of the six ASP subscales and composites in<br />

the four instructional conditions. Overall, mean ASP scores were highest for examinees who were<br />

coached on the “correct” responses or instructed to fake good. Note, the mean ASP scores for<br />

respondents who were warned about possible detection of faking were most similar to scores in the<br />

honest condition.<br />

Effect of Test Order and Instructional Condition<br />

Six 4 x 2 ANOVAs were used to examine the effects of instructional condition (4 levels), test<br />

order (2 levels), and their interaction on the dependent variables. The main effect of instructional<br />

condition was highly significant bc.001) for all ASP scales. The highest E value was obtained for the<br />

329<br />

.


Table 1<br />

Effect of Instructional Conditions on ASP Scales<br />

scale<br />

Instructional Condition<br />

COACHED- COACHED-<br />

HONEST FAKE GOOD WITH WARNING NO WARNING<br />

(n=126) (n=148) (n-109) (n=lOO)<br />

-<br />

ASAP Total 114.58 (10.67) 120.50 (9.35) 117.59 (12.06) 120.80 (9.86)<br />

ABLE Total 142.01 (15.18) 158.33 (14.44) 144.88 (14.54) lSS.51 (15.87)<br />

Achievement 54.90 (7.72) 62.13 (6.53) 56.07 (6.30) 60.13 (7.13)<br />

Adjustment 33.06 (4.50) 36.51 (4.13) 33.36 (4.64) 35.18 ’ (4.98)<br />

Dependability 54.02 (5.49) 59.61 (5.39) 54.50 (6.22) 58.99 (5.63)<br />

Fake Validity 15.99 (3.27) 21.40 (5.50) 16.24 (3.75) 20.74 (4.77)<br />

,&&. The maximum sample sizes are reported. Sample sizes vary slightly across outcome<br />

measures. Standard deviations are presented in parentheses.<br />

ABLE Validity scale, E(3, 475) = 50.75, ec.001. As shown in Table 1, the honest and coached-with<br />

warning groups had comparable means on the Validity scale, with J$ = 15.99 and u, = 16.24,<br />

respectively. By comparison, the means on this scale were about one standard devration higher in the<br />

fake good @ = 21.40) and coached @ = 20.74) groups. None of the main effects of test order or the<br />

treatment group by test order interaction was significant (all ~s.05)<br />

Effect Sizes for Instructional Group Comparisons<br />

Effect sizes for the 6 possible combinations of instructional comparisons for all ASP scales and<br />

composites are reported in Table 2. Scheffe test significance levels for each comparison are also<br />

shown.<br />

Table 2<br />

E f f e c t<br />

Honest v. Honest v. Honest v. Fake Good v. Fake Good v. Coached v.<br />

Scale Fake Good Coached-W Coached-NW Coached-W Coached-NW Coached-M<br />

ASAP Total -4.562 -0.26 -0.58* +0.27 -0.03 -0.26<br />

ABLE Total -0.96* -0.19 -0.00* +o.a4* +0.19 -0.73*<br />

Achievement -0.90* -0.16 -0.67* +0.05* +0.30 -0.64f<br />

Adjustment -0.74* -0.06 -0.47* +0.60* +0.31 -0.39<br />

Dependability -0.91* -0.08 -0.90* +0.81* +0.11 -0.72*<br />

Fake Validity -1.01* -0.07 -1.4s* +0.94* +0.12 -1.20'<br />

Note. Coached-W = Coached with a warning about fake detection items in the test.<br />

Coached-NW = Coachedwithout a warning about fake detection items in the test.<br />

'The difference in group means divided by the pooled group standard deviation.<br />

� p


Overall, the results replicate the findings from our previous research. As in the earlier experiment,<br />

the scores of soldiers given the “fake good” instructions were significantly higher than soldiers’ in the<br />

honest condition. This shows that the “fake good’ instructions were effective in producing positive<br />

response distortion.<br />

Also, scores resulting from the honest and coached-with warning conditions were not significantly<br />

different from each other. Thus, response distortion on the ASP was reduced (but not necessarily<br />

eliminated) in the group given the ‘coached-with warning” instructions. The combination of the warning<br />

of fake detection items and instructions not to appear “too perfect” may be responsible for this<br />

suppression of positive response distortion.<br />

In our extension of the research, we also examined the effect of coaching when no warning about<br />

fake detection items is given. As shown in Table 2, soldiers in this condition had significantly higher<br />

scores than those given the “honest” instructions. However, the scores of those in the coached-without<br />

warning group did not differ significantly from those in the fake good condition. Thus, the aeneral<br />

(faking) strategy (i.e., describing oneself in a way that insures being selected by the Army) and the more<br />

soecific (coached) strategy (trying to present oneself as mature, responsible, well-adjusted, hardworking,<br />

well organized, and easy to get along with) were equally effective in producing response<br />

distortion. Finally, in comparison with the “coached” instructions, the addition of a warning about faking<br />

detection items resulted in significantly lower scores on 4 out of the 6 scales. This demonstrates that the<br />

warning was at least partially effective in reducing response distortion.<br />

Grouo Differences in Correlations Amona ASP Scale Scores<br />

Correlations of the Validity scale with the other ASP scales were examined in each of the four<br />

conditions. As expected, the lowest correlations with the Validity scale were found when examinees<br />

were responding honestly & = .20 to .37, all p-z.05). The highest correlations with the Validity scale<br />

were found when subjects were coached or told to fake in the socially desirable direction (r = .30 to .71,<br />

all g< .05). The correlations with the Validity scale within the coached-with warning group @ = .11 to<br />

SO) were generally higher than the correlations found within the honest group, but smaller than the<br />

correlations for the two other groups. This indicates that the coached-wi?h warning group distorted their<br />

responses in a positive direction, but not as much as the faking or coached groups.<br />

Utilitv of the Validitv Scale<br />

for Detectina Resoonse Distortion<br />

The purpose of the ABLE Validity scale is to identify individuals who have distorted their responses in<br />

a socially desirable direction. We examined how effective this scale would be in correctly classifying<br />

persons who were coached or instructed to distort their ASP responses.<br />

Table 3 shows how well the Validity scale discriminates among the groups, for each possible cut<br />

score that might be used to classify distorted responses. For example, with a cut score of 27, no one in<br />

the honest group would be incorrectly classified as faking (i.e., deliberately distorting responses in a<br />

socially desirable direction). However, this cut score would correctly classify 22% of those given the<br />

fake good instructions, 15% of those coached, and 3% of those coached-with warnings. Thus, all<br />

individuals in the fake good or coached conditions who were at or above the cut score would be<br />

correctly classified as fakers. Moreover, this would be done without misclassifying anyone in the honest<br />

group (since no one in this group had a Validity score above 26). The results also show that response<br />

distonion among those given the coached-with warning instructions is most difficult to detect. This is<br />

consistent with the finding that Validity scores between the honest and coached-with warning groups do<br />

not differ significantly.


Table 3<br />

Detection of Response Distortion Among Instructional Groups<br />

Dsino the validitv Scale<br />

Percent Percent Percent<br />

Response validity False Alarms Percent of Coached-W Coached-NW<br />

Scale Cut score (in the Honest All Fakers Respondents Respondents<br />

(at or above) sample) Detected Detected Detectet<br />

(n=233) (n=213\ In=2481 (n-100)<br />

11<br />

12<br />

13<br />

14<br />

15<br />

16<br />

17<br />

18<br />

19<br />

20<br />

21<br />

22<br />

23<br />

24<br />

25<br />

26<br />

27<br />

28<br />

29<br />

30<br />

31<br />

32<br />

33<br />

100.0<br />

97.4<br />

91.8<br />

75.5<br />

62.7<br />

49.4<br />

40.3<br />

28.3<br />

19.7<br />

13.7<br />

9.9<br />

6.C<br />

3.0<br />

’ “7<br />

1.3<br />

0.4<br />

0.0<br />

0.0<br />

0.0<br />

0.0<br />

0.0<br />

0.0<br />

0.0<br />

100.0<br />

99.1<br />

97.2<br />

100.0<br />

93.1<br />

89.5<br />

93.0 77.4<br />

89.7 66.9<br />

85.0 53.6<br />

79.4<br />

46.3<br />

74.2 34.6<br />

66.7 23.7<br />

59.2 16.0<br />

51.6 11.6<br />

43.2 9.6<br />

37.1 6.8<br />

29.6<br />

4.8<br />

27.7 4.0<br />

23.5 2.8<br />

21.6 2.8<br />

16.4 2.8<br />

13.1 2.0<br />

10.3 0.8<br />

8.9 0.4<br />

1.9 0.0<br />

1.4 0.0<br />

100.0<br />

100.0<br />

99.0<br />

97.0<br />

93.0<br />

86.0<br />

81.0<br />

72.0 . .<br />

65.0<br />

55.0<br />

46.0<br />

39.0<br />

3 2 . 0 .<br />

24.0<br />

22.0<br />

19.0<br />

15.0<br />

8.0<br />

7.0<br />

5.0<br />

5.0<br />

2.0<br />

2.0<br />

Note. Coached-W = Coached w a warning about fake detection items in the test.<br />

Coached-NW = Coached without a warning about fake detection items in the test.<br />

Except as noted, samples were aggregated from two experiments.<br />

'Sample was obtained from the current experiment only.<br />

DISCUSSION<br />

The results in this paper serve to corroborate earlier findings by Hanson et al. (1989), while<br />

adding new information on the effects of coaching. First, we found that the inclusion of warning<br />

statements about lie detection items seems to suppress response distortion to almost honest condition<br />

levels. The use of warning statements may be helpful to deter intentional distortion in future<br />

administrations of the ASP. Secondly, the Validity scale was found to be reasonably effective in<br />

detecting response distortion. High Validity scale scores were shown to correctly identify a substantial<br />

percentage of fakers, without misclassifying honest respondents.<br />

In addition to these findings, our results suggest that coaching instructions designed to simulate<br />

“real-life” coaching by recruiters may be no more effective in eliciting response distortion than general<br />

instructions to “fake good”. Outside guidance on distorting ASP responses may serve to motivate<br />

applicants to fake. However, it is questionable as to whether such guidance would make a significant<br />

difference in the ASP scores of applicants who would otherwise be motivated to dissemble.<br />

Finally, future research will examine how positive response distortion affects the validity of the ASP<br />

for predicting attrition. We pfan to investigate the feasibility of using the Validity scale to adjust ASP<br />

scores for faking. Such an adjustment might enhance the validity of the ASP in predicting attrition, as<br />

well as other important Army criteria.<br />

332


REFERENCES<br />

Hanson, M.A., Hallam, G.L., & Hough, L.M. (1969, November). Detection of resoonse distortion in the<br />

AdaDtabilitv Screenino Profile [ASPA. Paper presented at the 31st Annual Conference of the<br />

<strong>Military</strong> <strong>Testing</strong> <strong>Association</strong>, San Antonio, TX.<br />

Hough, L.M. (1987, August). Overcomina obiections to use of temDerament variables in selection:<br />

Demonstrating their usefulness. Paper presented at the American Psychological <strong>Association</strong><br />

Convention, New York, NY.<br />

Trent. T., Atwater, D.C., & Abraham% N.M. (1986, April). Experimental assessment of item response<br />

distortion. In Proceedinas of the Tenth Psvcholoav in the DOD Svmposium. Colorado Springs,<br />

CO: U.S. Air Force Academy.<br />

.<br />

Walker, C.B. (1985). The fakabilitv of the Armv’s Militarv Aoolicant Profile fMAP1. Paper presented at the<br />

<strong>Association</strong> of Human Resources Management and Organizational Behavior proceedings,<br />

Denver, CO.<br />

333<br />

- ---<br />

.


PSYCHOMETRIC PROPERTIES OF A NUMBER COMPARISON TASK:<br />

MEDIUM AND FORMAT EFFECTS<br />

Banderet, L.E., Shukitt-Hale, B.L., Lieberman, H.R., Simpson’,<br />

LTC R-L., Perez’, CPT P.J., U.S. Army Research Institute of<br />

Environmental Medicine, Natick, MA, and ’ TEXCOM Armor and<br />

Engineer Board, Advanced Technology Research Div., Fort Knox, KY.<br />

ABSTRACT<br />

Researchers adapting or developing performance tasks for<br />

administration on personal computers are confronted with choices<br />

that may affect the task’s measurement properties. To eva I uate<br />

the effects of test medium, subjects completed Number Comparison<br />

(NC) tasks administered with both paper-and-pencil and portable<br />

computer media. Computer i zed NC proved super i or to<br />

paper-and-pent i I NC ; Lhe automated version had greater completion<br />

rates, ’ r-’ ‘Lbilities, and sensitivity to environmental<br />

stressors”;~ypoxia and cold).<br />

In a second study investigating task format, subjects were<br />

tested with a computerized NC task which presented either 1 or 33<br />

problems in each display window. Although the results were<br />

similar for these two formats, the response rates for the two<br />

formats were dependent upon the number of administrations. On<br />

some of the later administrations, rates for the multiple-problem<br />

format were 10% greater. Thus, formal evaluation during<br />

adaptation or development of computerized performance tasks helps<br />

ensure evolving tasks will possess reliability, sensitivity, and<br />

other useful psychometric properties.<br />

fNTRODUCTlON<br />

<strong>Testing</strong> performance capabilities with tasks automated by<br />

computers is more feasible today than ever before since computers<br />

possess better displays, process information faster, execute<br />

larger programs and data bases, store more information, and cost<br />

less. When a performance task is adapted or deve I oped for<br />

administration by computer, the subject’s output responses and<br />

the instrument’s psychometric properties may change (Banderet et<br />

al., 1989; Moreland, 1987).<br />

We evaluated the automation of a performance task in two<br />

studies.- In the first, an automated Number Comparison task (C-NC)<br />

was compared to its paper-and-pencil equivalent (P-NC). In the<br />

second study, the format of the display on the automated Version<br />

Was evaluated. Displays with a single problem were compared with<br />

displays with 33 problems. This report will describe the effects<br />

Of task medium and format upon the psychometric properties Of an<br />

automated NC task.<br />

334


Sub jects---Twen ty medical research volunteers from Fort Detrick,<br />

MD, and Natick. MA, were subjects for study 1. Thirty two Ml-Al<br />

a&Or personnei fr&n Ft. Knox, KY, participated in study 2. Al I<br />

soldlers participated in these studies after they were given<br />

physicals and were fully informed about the conditions and procedures<br />

of the study. Investigators adhered to AR 70-25 and<br />

USAMRDC Regulation 70-25 on Use of Volunteers in Research.<br />

Assessment Instruments---The Number Compar i son Task involves<br />

evaluating pairs of numbers to determine if the two numbers in<br />

each problem are the same or different. In the first study,<br />

automated and paper-and-pencil versions of the NC task ,wtre .<br />

studied. The paper-and-pencil task (P-NC) was generated by<br />

computer and printed on a laser copier. The automated Number<br />

Compar i son (C-NC) task was administered on a GRiD Compass<br />

portable computer. A subject’s response could not be changed<br />

after it was entered on the keyboard of the automated task.<br />

These assessment measures and experimental data are described<br />

elsewhere (Shukitt et al., 1988).<br />

METHOD<br />

In the second study, two formats of the automated NC task<br />

were studied. During testing, a display on a subject’s computer<br />

showed either 1 or 33 problems to be evaluated. The later format<br />

Was similar to the format used in study 1 on both versions of NC.<br />

Procedures ---Both studies reported in this paper were repeatedmeasures<br />

L designs and were incorporated into larger investigatlOnS<br />

with other objectives. The first was to determine if an<br />

amino acid, tyrosine, prevents some of the adverse behavioral<br />

effects i nduced by environmental stressors. Specifically, 20<br />

subjects were exposed to 4700 m of simulated high altitude and<br />

17OC for 7 h; two other occasions they were exposed to 550 m and<br />

22OC (baseline). The automated and paper-and-pencil versions of<br />

the NC task were administered 300-320 minutes after ascent with<br />

10 min separating their respective administrations.. Initially,<br />

subjects practiced the NC task 15 times and learned to perform<br />

quickly with


sensitivity to experimental effects, a z score was calculated<br />

since it reflected both the magnitude and variability of measured<br />

effects.<br />

RESULTS<br />

At 550 m + 22OC, performance rates were greatest for the<br />

automated NC task since P-NC task was 87% of C-NC (See Table I).<br />

Task definikiuil for the C-NC task was also better than its ImnUal<br />

counterpart. The reliability of administrations during the<br />

experimental and control conditions was greatest for the C-NC<br />

task. The C-NC version of the NC task was also more sensitive to<br />

altitude effects than the manual version of this performance task<br />

as inferred from z score magnitudes.<br />

TAB’-’ . ; i Prhrzrties of the paper-and-pencil (P-NC) and<br />

automatea (C-NC) versions of the Number Comparison Task<br />

at 550 m + 22OC and 4700 m + 15OC.<br />

CRITERION<br />

Baseline Rates<br />

(correct/min)<br />

Minutes Practice Required<br />

(mln/admin)<br />

Task Definition<br />

(admin 5 and 6)<br />

i<br />

STATISTIC I P-NC : C-NC<br />

8<br />

BASELINE VALUES (550 m + 22OC)<br />

Mean 25.13 28.78<br />

Sigma 8.02 8.88<br />

Mean 20 20<br />

Pearson’s r .89 .94<br />

Reliability Pearson’s r -81 . 91<br />

(550 m vs. 4700 m)<br />

ALTITUDE + COLD EFFECTS (4700 m + 17OC)<br />

Altitude Effect Mean - 5.04 - 5.26<br />

(change In correct/min) Sigma 5.08 4.07<br />

z score - -99 - 1.29<br />

336<br />

.


In the second study, the response rates for the multiple<br />

prob I em format interacted with administrations (i.e., days).<br />

Rates for the multiple-problem format were always greater than<br />

rates for the single-problem format; days 3 and 4 of the field<br />

test they were approximately 10% greater (See Fig. 1). Practice<br />

requirements, task definition, and task sensitivities were<br />

comparable for the two different display formats. The I arger<br />

response rates for the multiple-problem format are noteworthy<br />

since faster rates are usually associated with tasks that have<br />

superior psychometric propertles.<br />

C 0<br />

r<br />

e<br />

c t<br />

P er<br />

m in<br />

DAYS<br />

.-Q.. 33 Problems -+-- 1 Problem<br />

Fig. 1: Response rates for an automated Number<br />

Compar i son Task for a l-problem or a 33-prob I em<br />

display. Each task was practiced four times previously<br />

(5 min per administration).<br />

I3 I SCUSS I ON<br />

Automated NC was super ior to its paper-and-pencil<br />

counterpart. The response rates, sensitivity to environmental<br />

stressors, and test-retest reliabilities were greater for the<br />

automated version than for the paper-and-pencil version. This<br />

demonstrates that when performance tasks are automated, modified,<br />

or developed they may have different psychometric properties than<br />

thelr traditional counterparts. In this evaluation, the<br />

automated verslon of the NC task possessed the best psychometr ic<br />

properties.<br />

Secondly, It is important that performance tasks be<br />

evaluated during their adaptation or development. The success of<br />

performance tasks is usually dependent upon their psychometric<br />

properties (e.g. sensitivity. requirements for pract:;:; TtV$<br />

test-retest reliabilities). Evaluation wi 11 ensure<br />

psychometric characteristics of the task can be optimized and<br />

that appropriate measures will be retained and used.<br />

337


---__- - ---.. _<br />

REFERENCES<br />

Banderet. L.E., Shukitt, B.L., Walthers, M-A.. Kennedy, R.S.,<br />

Bittner, A.C.. Jr., & Kay, G.G. (1989). Psychometric properties<br />

of three addition tasks with different response requirements.<br />

Proceedings 30th Annual Meeting <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong> (PP.<br />

440-445) . Arlington, VA: U.S. Army Research Institute for the<br />

Behavioral and Social Sciences.<br />

Moreland, K.L. (1987). Computerized psychological assessment:<br />

What’s available. In J.N. Butcher ted. 1 Computerized<br />

Psychological Assessment. New York: Basic Books, pp. 28-49.<br />

.<br />

Shukitt, B.. Burse, R.L., Banderet, L., Knight, D.R.. & Cymerman,<br />

A. (1988). Cognitive performance, mood states, and altitude<br />

symptomatology in 13-21% oxygen environments (Tech. Rep. No.<br />

18/88). Natick, MA: U.S. Army Research institute Environmental<br />

Medicine.<br />

338


SUBJECTIVE STATES QUESTIONNAIRE: PERCEIVED WELL-BEING<br />

AND FUNCTIONAL CAPACITY<br />

Banderet, L.E., O’Mara, M., Pimental’, N.A., Ri ley, SGT R.H.,<br />

Dauphinee, SSG D.T., Witt, SSG C.E., Toyota, SGT R.M., U.S. Army<br />

Research Institute of Environmental Medicine and ‘Navy Clothing<br />

and Textile Research Facility, Natick. MA.<br />

ABSTRACT<br />

Self-rated measures of symptoms and moods are especially<br />

Sensitive to stressors and often detect changes in well-being .<br />

before more object i ve indices (Beck, 1979). We dev? I oped a<br />

40-item Subjective States Questionnaire (SSQ) to exploit such<br />

measurement properties in our research program for determining<br />

the effects of extreme environments and evaluating treatment<br />

strategies. The SSQ assesses a greater range of reactions than<br />

most symptom or mood sea les and seeks est imates of a soldier ‘s<br />

capac i ty to perform common soldier tasks and other fami I iar<br />

activities or the effort required to complete them.<br />

In a laboratory Study of heat stress, SSQ data were collected<br />

during six, 135-minute test sessions. Nine soldiers gave<br />

verbal ratings of “how they felt at that moment” during selected<br />

exercise, rest, and recovery intervals. Many subjective states<br />

appear sensitive to these manipulations. Ratings of most capabilities<br />

return rapidly to normal after termination of exercise<br />

and heat exposure.<br />

INTRODUCTION<br />

Self-rated measures of symptoms, moods, and behavioral capabilities<br />

are often more sensitive than objective measures of psychological<br />

phenomena (Beck, 1979). The sensitivity of self-rated<br />

measures probably results because many phenomena can be assessed<br />

with self-rated instruments, human subjects can recall and integrate<br />

personal experiences over time, and sensory and perceptual<br />

systems are most respons i ve to changes in stimulation or<br />

activity.<br />

To exploit the advantages of self-rated measures in our ongoing<br />

research with soldiers exposed to environmental stressors,<br />

we deve I oped The Subjective States Questionnaire (SSQ). This<br />

questionnaire assesses perceived capability or the effort to<br />

complete a task by having the mi I i tary subject relate such constructs<br />

to common soldier tasks or other fami I iar activities.<br />

This paper describes preliminary findings with the SSQ from an<br />

experiment where military subjects were tested experimentally in<br />

a hot physical environment whi le wearing various uniform ensembles.<br />

METHOD<br />

Subjects--- Nine physically fit males (average Statistics: age,<br />

23 years; height, 69 in; weight, 165 lbs) volunteered for the<br />

339


test after they were fully informed about the conditions and<br />

procedures of the study (P imenta I , Avellini, & Banderet, in<br />

progress).<br />

Assessment Instruments---The SSQ is a 40-item, self-rated instrument<br />

(see Table I). It assesses perceived cognitive, memory,<br />

affective, sensory-perceptual, psychomotor, verba I, and kinesthetic<br />

capabilities. Many items operationally define estimates of<br />

such capabilities by relating them to selected common soldier<br />

tasks (HQ, Dept. .?rmy, 1987). For example, “I would have trouble<br />

running 2 m.i!zs in anything near my normal time” Some items are<br />

defined by relating them to familiar activities; e.g., Item.22 “I<br />

could remember spoken directions to a store a few miles from<br />

here. ” Twenty of the items in the SSQ are positive; e.g., “I<br />

could properly camouflage myself and my equipment.” The other<br />

twenty items are negative; e.g,, “If I were to drive an automobi<br />

le, I might commit traffic violations or cause accidents.”<br />

Each i tern is rated on a 6-point scale with discrete anchor<br />

points; i .e., “Not At al I , ‘I “Sl ight,” “Somewhat, ” “Moderate, ”<br />

“Quite a Bit,” and “Evtremely.” The SSQ can be administered as a<br />

mark-sent- ;LGtitionOait-e, as an automated questionnaire on a computer,<br />

or as a,\ oral survey.<br />

To simplify description and display of data from individual<br />

items of the SSQ, a I I rat ings for negat ive i terns are recoded and<br />

their verbal descriptions are restated positively. These transformations<br />

change each negat i ve item so it assesses a<br />

“capabi I i ty” and greater ratings reflect greater capability. For<br />

example, item 29 is “I would probably miss some information in<br />

military radio messages, without some “say agains”. During data<br />

analysis, this i tern’s ratings are recoded and it is restated as<br />

” I cou Id probably comprehend most information in radio messages,<br />

w l thout some “say agains”. After such transformations, all 40<br />

i terns of the SSQ assess “capabi I it ies” and larger ratings imply<br />

greater capability.<br />

Procedures ---Subjects exercised for 2-hours per day in a hot<br />

environment for 6 days before manipulation of experimental conditions.<br />

This avoided confounding the effects of physiological<br />

heat acclimation with heat strain induced by the experimental<br />

conditions. Acclimatizing conditions were 95OF dry bulb and 88OF<br />

Wet bulb (75% relative humidity) with a 2.0 mph wind.<br />

Then, the men were tested in a repeated-measures design to<br />

evaluate six configurations of a Navy firefighting ensemble with<br />

different heat-retaining properties (Pimental, Avellini, &<br />

Banderet, in progress). Each test day, each man wore a new<br />

configuration of the firefighting ensemble (randomly-assigned).<br />

Environmental conditions during each 2 hour experimental SeSSiOn<br />

were 9OoF dry bulb, 79OF wet bulb (60% relative humidity) with a<br />

2 mph wind. Subjects alternately sat for 15 minutes (metabolic<br />

rate 105 watts) or walked at 3.5 mph (500 watts) on a level<br />

treadmill. The time-weighted metabolic rate was approximately 300<br />

watts.<br />

The SSQ was administered 5, 20, 95, 110, and 125 min after<br />

the start of each experimental session. Each administration began<br />

the fifth minute of a scheduled resting, walking, resting, walki<br />

ng, or recovery i nterva I ; respectively. During each assessment,<br />

340


each item on the SSQ was read to the group by a medical NCO. Each<br />

subject’s ratings were sensed by a lapel microphone and recorded<br />

on a separate audio channel for subsequent data encoding and<br />

analysis. The last (fifth) administration was immediately after<br />

an experimental session. Subjects removed their un i forms and<br />

monitoring equipment and were tested in a room (at normal ambient<br />

temperature) 5 min after they finished walking on the treadmill.<br />

All data were analyzed with SPSS/PC+, V3.0. Results were<br />

significant if ~~0.05 (l-tailed). Data were frequently missing<br />

during a daily test session and often involved different subjects<br />

from administration to administration of an item. Paired T-tests<br />

were used in evaluating uniform ensembles rather than more traditional<br />

repeated-measures, analysis of variance statistics, since . .<br />

this statistic was not affected by missing values which occurred<br />

during another administration in a session,<br />

RESULTS<br />

These data demonstrate responsiveness of the SSQ to heat<br />

strain induced by metabolic heat production from exercise and<br />

high environmental temperatures. These data reflect average sub-<br />

ject responses under conditions of increasing heat storage<br />

induced by walk-rest activities.<br />

Activity-time changes on individual SSQ items dluring<br />

experi-<br />

mental sessions suggested three trends: decreased capabilities<br />

during the session with rapid recovery afterwards (Fig. 1).<br />

decreased capabilities, without rapid recovery (Fig. 2), and no<br />

apparent changes for some capabilities (Fig. 3). Each bar in Fig.<br />

l-3 has a “+I’ symbol above i t ; the hor i zontal bar on each symbol<br />

is the standard error of the mean for that data point. This<br />

report only shows illustrative data because of space limitations.<br />

Analysis of these data for other purposes required use of<br />

multiple comparisons to evaluate various configurations of the<br />

firefighting ensemble for different activities-times during the<br />

session. Table II shows i terns which appear most frequently<br />

affected by these conditions since these i terns were more often<br />

statistically significant for these comparisons.<br />

On some items, perceived capabilities are least during exercise,<br />

e.g. Item 24: “I would have trouble running 2 miles in<br />

anything near my normal t ime”. On other i terns, perceived capabilities<br />

are least during rest following exercise, e.g. Item 35:<br />

“I feel as good as I usually feel”. Most capabi I ities recover<br />

rapidly following exercise and heat exposure since values obtained<br />

5-10 min after the end of the experimental challenge are<br />

simi lar to base1 ine values. A few capabilities recover more<br />

slowly since they are still<br />

tion.<br />

impaired during the last administra-<br />

Missing data were evident for al I conditions but were more<br />

frequent during exercise or when subjects were approaching medical<br />

safety limits or feeling i I I. Although some data were lost<br />

because of equipment and procedural shortcomings, most missing<br />

data were caused by failures of the subjects to respond when they<br />

were uncomfortable or preoccupied with other activities.<br />

341


TABLE I: Actual Items on The Subjective States Questionnaire.<br />

l feel “overwhelmed.”<br />

I feel “vulnerable.”<br />

Right now, I could answer most promotion board questions.<br />

It would be more difficult than usual to understand new concepts that are being taught in a military class.<br />

My thinking and other mental processes are at their “max.”<br />

It would require more effort than usual to tell someone how to “shoot an azimuth.”<br />

My vision seems especially sharp and clear.<br />

My thoughts seem complete.<br />

I feel like spit shining my boots and polishing my brass.<br />

My body feels clumsy and awkward in this situation.<br />

I could complete gas mask conftience training, including unmasking in the “gas chamber” with no difficulty.<br />

It would take more effort than usual to complete a land navigation course.<br />

.<br />

I feel “out of touch” with my surroundings.<br />

I feel confused.<br />

I could easily play a difficult video game for 20-25 minutes.<br />

My thinking seems “sluggish.”<br />

I am having trouble remembering some things now.<br />

Staying in this study hardly ser IIS worth it.<br />

Sending a grid coordinate by radi , would require greater effort than usual.<br />

If I were +s,~-; _ .a Iotor vehi .le, my actions would seem “jerky” and “unconnected.”<br />

I could p;operiy cc L!sge myself and my equipment.<br />

I could remember spoken directions to a store a few miles from here.<br />

If I were driving an automobile, I might commit trafftc violations or cause accidents.<br />

I would have trouble running 2 miles in anything near my normal time.<br />

I can talk freely without stuttering.<br />

A 2-3 hour G.I. party might be difficult to “deal with.“<br />

Telling even a short joke would require more effort than usual.<br />

If a “password” and “challenge” were changed every two hours, it might be difficult for me to remember them.<br />

I would probably miss some Information in military radio messages, without some “say agans.<br />

I feel disoriented.<br />

I am as aware of feelings in my arms, legs, and body as I usually am.<br />

It would be hard to be up for 24 hours of guard duty now.<br />

I could disassemble and reassemble an M-16 correctly within time limits.<br />

Detecting a soldier in BDUs in tall brush would take more effort than it usually does.<br />

I feel as good as I usually feel.<br />

I would confuse some of the azimuths with the directions they represent.<br />

I feel “ate up.”<br />

My memory is working as well as it usually does.<br />

I feel good enough to max at least one part of the PT test.<br />

I would find it more difficult than usual to find a landmark such as railroad tracks on a map.<br />

4.5<br />

C<br />

A I<br />

1 3.5<br />

6<br />

I<br />

: 2.5<br />

b<br />

1.5<br />

5 20 95 110 , 5<br />

TIME (min) , POST<br />

BASELINE WALK 1 REST 3 WALK 4 ; EXPOSURE<br />

FIG. 1: SSQ item 3,, “Right Now, I Could Answer Most Promotion Board Questions, ‘I shows decreased capability with<br />

increasing heat strarn with partial recovery following termination of heat exposure and exercise.<br />

342<br />

4.5<br />

HI<br />

3.5<br />

2.5<br />

1LO<br />

1.5<br />

.


I4<br />

BI<br />

k3<br />

;<br />

2<br />

BASELINE WALK 1<br />

FIG. 2: Transformed SSQ item 1, “I Feel In Control (w Overwhelmed),” shows decreased capability with increasing<br />

heat strain with little, if any, recovery following termination of heat exposure and exercise.<br />

C<br />

A<br />

5<br />

I4<br />

B I<br />

i3<br />

5<br />

2 L<br />

TIME hid<br />

BASELINE WALK 1 REST 3<br />

I<br />

1 POST<br />

WALK 4 : EXPOSURE<br />

FIG. 3: Transformed SSQ item 36, “I Could Associate Azimuths With The Directions That They Represent,” shows little,<br />

if any, effects upon capability with increasing heat strain or termination of heat exposure and exercise.<br />

TABLE II: Actual Items from The Subjective States Questionnaire that yielded frequent statistically significant differences<br />

on comparisons of firefighting ensembles.<br />

l<br />

1, I feel “overwhelmed.”<br />

2. I feel “vulnerable.”<br />

3. Right now, I could answer most promotion board questions.<br />

10.<br />

12.<br />

17.<br />

18.<br />

My bod feels clums and awkward in this situation.<br />

It woul d’ take more exoft than usus; to complete a land navigation course.<br />

I am having trouble remembering some things now.<br />

Staying in this study hardly seems worth it.<br />

23.<br />

24.<br />

If I were driving an automobile, I might commit traffic violations or cause accidents.<br />

would have trouble running 2 miles in anything near my normal time.<br />

30. I feel disoriented.<br />

37. I feel “ate up.”<br />

343<br />

5<br />

HI<br />

4<br />

3<br />

LO<br />

2<br />

. .


--<br />

Acceptability of the questionnaire to our military test subjects<br />

was better than most symptom, mood, or personal ity questionnaires<br />

that we have administered before. This observation was<br />

supported by discussions about some items and comments suggesting<br />

the items were relevant to a soldier’s training and experiences.<br />

Subjects volunteered that items also made them think about the<br />

imp1 ications o f performing mi I i tary tasks in stressful situations:<br />

DISCUSSION<br />

This study explored the perceived capabilities of soldiers<br />

to perform common soldier tasks and other fami I iar activi,ties<br />

under varying degrees of heat strain. Varied human capabilities<br />

were affected by heat exposure and exercise. Interestingly, most<br />

items showed recovery even 5 min after termination of heat exposure<br />

and exercise. These preliminary results suggest that the<br />

SSQ may be usefu I in other situations which use military personnel<br />

as test subjects. The content of i terns fosters interest and<br />

cooperation, useful assets especially in challenging testing<br />

situations.<br />

Surveying a group of subjects oral ly, as was done in the<br />

present study, is advantageous in some exper imenta I situations,<br />

particularly when subjects are exercising or performing a task.<br />

To minimize missing data when the SSQ is administered Orally,<br />

Special emphasis must be given since there are fewer sanctions to<br />

encourage responding on each i tern than with other forms of questionnaire<br />

administration.<br />

These data demonstrate that subjects can provide systematic<br />

estimates of their perceived capabilities for varied tasks. Although<br />

this study did not validate subject estimates of their<br />

capabilities, the time courses of soldier capabilities appears<br />

plausible. Furthermore, the recovery of some capabilities in 5<br />

min or less emphasizes the limitations of using a “post” session<br />

measure to approximate “status” during an earlier stressful challenge.<br />

Th is observation a I so illustrates the importance of<br />

Sampl ing at appropriate times so that the time course of a phenomena<br />

can be accurately measured.<br />

REFERENCES<br />

Beck, A.T. Cognitive therapy for depression. New York: Gui I ford<br />

Press, 1979.<br />

Headquarters, Department of Army. Soldier’s manual of common<br />

tasks (skill level l), STP 21-I-SMCT, 1987.<br />

Pimental, N.A., Avellini, B.A., and Banderet, L.E. Comparison Of<br />

heat stress when the Navy fire fighter’s ensemble is worn in<br />

various configurations. Technical Report * , Navy Clothing and<br />

Textile Research Facility, Natick, MA, (in progress).<br />

344


Validity of Grade Point Average:<br />

Does the College Make a Difference?<br />

Diane L. iiomnglia, ClC<br />

J'acobina Skinner<br />

Manpower and Personnel IJivision<br />

Air Force Muman Resources Laboratory<br />

Throughout the military and private sector, undergraduate grade point<br />

average (GPA) piays an important role in jo’b selection decisions. 'I'his<br />

measure of academic achievement and demonstrated abiilty is widely field to<br />

predict.employee performance. Recent literature reviews show significant but<br />

modest relationships between GPA and employee performance both in training and<br />

on the job (e.g., Dye & Reck, lY88). .<br />

An issue raised by the use of GPA as a personnel selection factor<br />

concerns the possible lack of equivalence in the grade scale across colleges.<br />

The implication of these inequivalencies for emplojers is that expected<br />

performance would vary among job applicants who have tne same GPA, bc;t wuo<br />

graduated from different colleges. Research on this issue is sparse, but two<br />

studies suggest that a school factor may moderate the GPA-performance<br />

relationship. Dye and Reck (1988) have found correlations for graduat,es of<br />

the same college to be higher on average than those for graduates of ditterer;t<br />

colleges. Further evidence that college characteristics may influence tIie<br />

predictability of GPA has been reported for Air Force officers commissiniie!;<br />

from the Reserve Officer Training Corps (ROTC) program (harrett 6 &mstroug,<br />

_ 1989). Performance prediction was improved by considering a quality mec:sure<br />

for the officers' college (Scott, 1984) in addition to their tipA.<br />

The current study extends the investigation of the college and Cri issue<br />

in the Air Force to a second officer commissioning source: the Officer<br />

Training School (OTS) at Lackland AFB, TX. The findings of a two-phase st;xi><br />

of the relationship between GPAs awarded to cadets graduating from ditferen?<br />

colleges and their subsequent performance in VI'S are reported. Yt1e study<br />

design was previously described by Skinner and Armstrong (lY!Nj. In tile<br />

analytic phase, the initial focus was on the validity or’ Gi?A as a cadet<br />

selector. Both simple GPA effects and the joint effect of college anu C:,A<br />

were investigated. If differential validity for colleges was observeo, tl:e<br />

study design provided for an explanatory phase to identify the characte;‘i+;~;',i.~<br />

of colleges which may be responsible.<br />

Analytic Phase: GPA and College Relationships with Cadet Performance<br />

The analytic phase was conducted to answer two primary questions: i, is<br />

the GPA a valid predictor of OTS performance; and 2) is the relationship<br />

between GPA and performance moderated by college attended?<br />

Procedure<br />

Method<br />

Data were obtained from archival files maintaincc: on Xir ~'OTC.C oeiic:ers.<br />

An initial sample of 11,619 cadets who entered (jTS between l;"?jL .g..ilil i'y?:ii ;,> w,;:. y;<br />

identified. Source data for the primary predictor variables we:re cdc,eti'<br />

4-year undergraduate GPA reported on a 4.0 scale rind rile colicge i*ji;il:::!<br />

345


conferred their baccalaureate degree. hieasures 9i‘ cadet pt2riurbance zec5<br />

obtained Erom various phases of the 12-week W‘s program. Ke2sOfl for<br />

terminating training was used to generate a Pass/Pail dichotomy rekiectirig<br />

final training outcome for the total sample. Light aaditi.oLlaL measures of<br />

performance were available for graduates (ti = Y,Yjo>. k'irlal Course Grade was<br />

an overall rating of academic success in the training course obtained by<br />

s r;:jr<br />

GPA and colleges with cadet performance that were less complex than t;it? .ZIW<br />

hypothesized by the starting model. Possible outcomes were an interaction<br />

between GPA and college, but of a simpler functional form (either linear o'<br />

curvilinear). Alternate models provided for a joint but noninteracting errec:,<br />

due to GPA and college (with either a linear, quadratic, or ciibic form:!. i-1:<br />

these cases expected performance would differ by coilege at fixed *;fLi vaize.+.<br />

but the difference per unit change in GPA would be constant. I';1 e L s .-Cd._ P .c *<br />

complex models specified an effect due solely to GPA (Linear, qu;l(iratill. -:i.:<br />

cubic) or solely to college.<br />

To isolate the "best" model, pairs of models were C 2; :.;:L<br />

most appropriate model for each criterion.<br />

Results of GPA Validation<br />

Simple GPA effects were found for all criteriri except the pa.ss! c2.l i<br />

dichotomy. As shown in Table 1, the bivariate correlations kt;jecn :::-T .i::<br />

performance criteria and GPA indicate low to medium-Low p


elationships. The highest correlation was observed for Final Course bra&e (r<br />

= .31 p q.01) and the lowest correlation for the Pass/Fail dichotomy (r = . ui;<br />

p).CSL Because the study focused on identifying a college effect for a<br />

specific criterion only if a GPA effect was found, the L-assiFai1 measure was<br />

excluded from further analyses.<br />

Joint GYA and college effects were found for all remaining criteria<br />

except the 6th week OTER. Information about both college identity and tiPA<br />

made a unique contribution to prediction of the cadets’ training performance.<br />

However, no interaction between GPA and college was detected. Expected<br />

training performance differed by a constant amount at all GL'A levels.for<br />

graduates of different colleges. The functional form of the CPA-performance<br />

relationship for colleges was linear for three performance criteria and<br />

curvilinear for four performance criteria. Figure 1 illustrates the ,<br />

'representative finding. A curvilinear relationship between GPA and<br />

performance is depicted, and between-college differences in expected<br />

performance are shown to be the same across GPA values.<br />

Table 1.3 Correlations (uncorrected)<br />

of Criteria with GPA<br />

Criteriona r<br />

Pass/Fail .Ol<br />

Final Course Grade .31X"<br />

cwr 1 .19X"<br />

CWT 2 .22x*<br />

CWT 3 .21**<br />

CWT 4 .22**<br />

CWT 5 .18X*<br />

OTER 6th Week .07*<br />

OTER 11th Week .20**<br />

aPass/Fail N = 11,619. Other<br />

criteria N = 9,858.<br />

* p (.05.<br />

**z c.01.<br />

Expected CWT 5 Score<br />

,00 --.-..----- -.-.-<br />

98.<br />

08<br />

94'<br />

.__ _,.____ _. .<br />

92 + _-______ _..__ i<br />

-.<br />

_._.” *<br />

_... -.--.' . . '<br />

SO<br />

88<br />

,---e---.. _- ..*.- -.--. -.-.<br />

. . . .<br />

88 * ---:.,:*<br />

L---.r ---. -- _.._ -_- 7.-..---. _- , _ , ..<br />

1.00 2.00 3.00 4.00<br />

_ _..<br />

Grade Point Average<br />

-- School ‘A’ --+ School ‘8’ -- School ‘C’<br />

Figure 1. Relationship Between Jrii<br />

and CWT 5 Score for Different Collepzs<br />

Explanatory Phase: Characteristics Which Account for College Effects<br />

The explanatory phase was accomplished once the results of the analytic<br />

phase showed that the relationship between GPA and cadet success varied by<br />

college. The objective of this phase was to identify variables reflecting<br />

the characteristics of colleges which might underlie the combined effect of<br />

GPA and college. Of interest was whether performance variance accounted for<br />

by colleges was due primarily to the talent of students (college<br />

selectivity) or to the nature of the academic experience (educational<br />

environment). Astin (1962, 1971) showed that both classes of variables can<br />

be used to distinguish colleges, but suggested (1972) tnat selectivity is<br />

the more important correlate of graduates' future performance.<br />

347<br />

4<br />

.


Subjects<br />

Method<br />

The unit of analysis was colleges. Eleven of the lU2 institutions were<br />

eliminated because data on all of the predictors could not be obtained.<br />

This reduced the number of colleges analyzed to 91.<br />

College Measures<br />

College Selectivity. College selectivity was defined as a measure<br />

which captured the prestige of the university as reflected by the talent of<br />

the students attracted and accepted to the college. To measure college<br />

selectivity, the average scores of the entering freshman classon .<br />

standardized tests (the Scholastic Aptitude Test (SAT) and the American<br />

College Test (ACT)) were recorded. In addition, the selection ratio of the<br />

college (i.e., percentage of applicants accepted) was computeo.<br />

Educational Environment. Educational environment measures reflecteti<br />

academic experiences provided by the university. Measures were percentage<br />

of graduate students, ratio of students to faculty, percentage of full-time<br />

faculty with PhDs, number of volumes in the library, and yearly dollar value<br />

of endowments.<br />

Procedure<br />

The sources of data for the college selectivity and educational<br />

environment predictors were various published documents reporting suct~<br />

statistics (e.g., American Council on Education, 1983, 1987; The College<br />

Blue Book, 1987; Lehman, 1966; National Center for Educational Statistics,<br />

1987). Data used as criteria reflected the unique contribution of the 41<br />

colleges to the prediction of OTS cadet performance. l'hese values were the<br />

regression weights (b-weights) for the college membership binary variables<br />

from the "best" model in the analytic phase.<br />

Analysis<br />

Regression analyses were used to explore the relative contribution ot‘<br />

the two classes of college charactristics in accounting for the college<br />

effect observed for the seven OTS performance measures. Two models, in<br />

which the b-weights for colleges were regressed on both college selectivity<br />

and educational environment measures (Model 1) and on college selectivity<br />

measures alone (Model 2), were analyzed. These models were designed to test<br />

the hypothesis that the variation in expected performance level observed for<br />

graduates of different colleges was due to college selectivity or the taletlt<br />

of the student body, not to the educational environment. The predictor seis<br />

included binary and product vectors for the SAT and ACT variables in order to<br />

account for the schools (N = 51) which reported only one test score, either<br />

SAT or ACT. The predictive accuracy of the two models was compared tising t!ie<br />

P statistic (p 4.01). If the models differed significantly for a criterion,<br />

stepwise regression analyses 'were also accomplished to identify the most<br />

salient indicators among the available educational environment measures. .%<br />

backward elimination method was used to determine which. educational<br />

environment measures improved predictability (p


College Selectivity Versus Educational Environment<br />

As shown in Table 2, the multiple correlations (K) for the college<br />

selectivity and educational environment measures in combination (&ode1 1)<br />

ranged from .45 to .66. The highest relationships (R = .60 or greater) were<br />

obtained for the Final Course Grade, CWT 2, and 11th week OTEK performance<br />

criteria, and the lowest relationships for the CWT 1 and CWT 5 criteria.<br />

The two classes of college characteristic predictors accounted for about 20x<br />

to 40% of the variance (R2) in expected performance due to college<br />

attended.<br />

Table 2. Regression Analysis Results for College Characteristics<br />

College Se1 &<br />

Academic Envir College Se1<br />

OTS Performance Model 1 Model 2 _<br />

Criteria R RL K KL<br />

Final Course Grade 26 4 .42 .62 .3t5 .Y><br />

CWTl .21 .42 .17 7' J<br />

CWT 2 .66 .43 .b3 .40 1:;ti<br />

cwr 3 .51 .25 .46 .21 .c14<br />

CWT 4 .57 .32 .53 .16 .';l;j<br />

cwr 5 .45 .20 .44 .2lJ . .LtJ<br />

OTER 11th Week .6O .jb .47 .22 3 . ly*"'<br />

** p


Personnel. manag~3rs 12sirig wxder.cjrG.~mtf~ CBA ri.s 3 .!ol se'lect 'on Factor<br />

shoul.d be cognizant that the expected futclre performance of employees ma:?<br />

vary as a function !:,f the college attsnfiec!. In aeerci.es . . wl th se? wt ion<br />

systems relying exclcsively on GPA, consideration of the se?.PctiVlty<br />

characteristics of: ?.nc!.!vf!Iual. instituttons 11oV.s pro:nise as the besix f'or 5<br />

methodology to adjust ior the col.le~:e effect. Apnc ies with ee!.ect ion<br />

procedures wh.ich include a measure or each applicant’s coflnitive al?!.!‘tV<br />

(i.e., standardized test score) Jn ar!d!tion to GPA ~.y fir;6 that the<br />

aptitude component cs?tures tf-rc. pcrfornancc vrlriance cllle to co?lep.e?.<br />

Barrett, L.E., & kxstroq, S.D. (1x9, Noventer). %tieratw effects of x?v?].<br />

characteristics on tie predictive val.Idity of college ‘g&e Fint zverv (Cl?). Pqxr<br />

presentecf at the 31st Amxal Q&kxncc of the ?'XItw <strong>Testing</strong>: Assoclat~~, S;F7<br />

Antonio, 7x.<br />

Cman, D.K., R?irJ??tt, L.P., & !&gner, TX. (19%). Air Force r)ff?cer Tvrinirg S&w!1<br />

system (.4Efm-~-@MF5). F@lmk~ AFP,, lY: .v.r FmP? !3lm?!<br />

selectlm L<br />

Flesources J&oratory.<br />

Tr?m, A.!% (kx). (1935). Peterscm's g&k to fax-year colleges (.LW! ti.?. V.WMYF-,,<br />

NJ: Pc?%e?xon's Guides.<br />

350<br />

.


Introduction<br />

Flight Psychological Selection System - FPS 80:<br />

A New Approach to the Selection of Aircrew Personnel<br />

H.D. Hansen<br />

Ministry of Defense, Bonn, Germany<br />

The Selection of Ah-force and Navy flight personnel is a progressive process, commencing. _ .<br />

before the enlistment of the candidates (Phases 1 and 2) and continuing after the normal military<br />

training (which lasts for approximately one year) into Phase 3.<br />

The first Phase is a general screening of such factors as Intelligence and Leadership qualities,<br />

carried out in the respective Officer or NC0 Selection Centres.<br />

The second Phase is a preliminary flight-aptitude screening, using Computer-based psychological<br />

tests, grading candidates as broadly ‘Suitable’ or ‘Unsuitable’.<br />

The third Phase is more precise, making a final decision as to candidate suitability and further<br />

predicting what particular activity each candidate would be best suited for (e.g. Jet, WSO, Prop,<br />

Helicopter or Navigator).<br />

It consists of 3 weeks Navigation/Academic instruction, 1 week FPS 80 Selection and for those<br />

who have survived thus far, 5 weeks Plying instruction on light prop aircraft, including 18 flying<br />

hours.<br />

FPS 80 is the abbreviation for the Flight Psychological Selection System of the Aviation<br />

Psychology Section, Aerospace Medical Institute of the German Airforce.<br />

As the need was identified to improve the effectivity and reliability of the Selection System,<br />

FPS 80 was conceptual&d. It was then designed and a detailed Functional Specification was<br />

prepared, from which the required Hardware and Software was commissioned.<br />

FPS 80 was installed in July of 1987, from which time it was further tested and standardized. It<br />

was introduced as part of the selection process on the 1st April, 1990.<br />

In this paper, we will concern ourselves with a description and statistical evaluation of the FPS<br />

80 Selection system.<br />

An overview of FP!S 80<br />

All those skills which are very difficult or impossible to test in the flying part of the screening,<br />

need to be evaluated, and this is the principal function of FPS 80 - to determine the particular<br />

skills of each individual candidate.<br />

Such particular skills as the multiple tasks required of a WSO, speed of information processing,<br />

estimation abilities in formation flying, spatial orientation and visualization. FPS 80 is much<br />

better capable of categorizing these particular skills, than the later Flying screening.<br />

FPS 80 makes use of a complex simulator-like device, which provides a test-environment very<br />

close to actual flying. The advantage of such a device compared with an aeroplane, consists of<br />

the ability to make an objective measurement of candidate performance in a standardized test<br />

351


situation devoid of external distractions. In this way an adequate performance comparison<br />

between different candidates is provided. In addition a qualitative description of candidate<br />

behaviour may be formulated by observations during the test.<br />

Description of the FPS 80 Test Device<br />

Test position<br />

The two identical test positions are built to resemble cockpits. They contain a seat and the usual<br />

flight controls, viz: stick, rudder, flap-lever, gear-switch, and throttle. These are actual parts<br />

from scrapped military aircraft. In the interest of cost reduction, a stationary cockpit is used.<br />

Impressions of movement originate exclusively from visual inputs.<br />

Conventional flight instruments are depicted on an instrument panel. Three colour VDUs appear<br />

above the panel. These represent the view from the cockpit. The view forwards covers a<br />

landscape of approximately 80 kilometres square. The view includes an airfield and the<br />

surrounding landscape. The scale of the depicted landscape represents the cockpit current<br />

displacement from it; the speed of change of a display represents the speed of the cockpit and<br />

perspective of the objects displayed represents the current orientation of the cockpit. In this way<br />

a realistic impression of motion is convey to the candidate.<br />

In the lower third of the central VDU are displayed the following instruments: power, speedo,<br />

compass, horizon, altimeter, vertical speed indicator and G-meter.<br />

The cockpits additionally have a control and warning-panel that gives information about such<br />

things as landing gear (up/unsafe/down), flaps (up/down), parking brakes, stall warning. An<br />

input key-pad is found on the right side. The performance characteristics mirror those of a<br />

standard single engine machine. System parameters may be changed to simulate other machine<br />

types. The two cockpits operate independently of one another.<br />

System Configuration<br />

The FPS 80 system comprises 7 computers linked by a network. One of these is a central<br />

computer and each cockpit is driven by three more. Tests are controlled from the central<br />

computer console ftom where the test supervisor can start the different test programs, communicate<br />

with the candidates and monitor their progress. He can additionally intercept their visual<br />

displays, and speak with the candidates, singly or severally, by radio link. The candidate<br />

performance data is returned to the central computer where it is stored on tape, later to be<br />

processed on an external computer in combination with the results of the other screening<br />

procedures, to produce a composite performance profile for each candidate. The results from<br />

all candidates may then be statistically analysed.<br />

Test procedure<br />

The PPS 80 Test Procedure consists of 5 missions. Each candidate receives a standard briefing<br />

from the instructor befog beginning each mission. A mission is built up of various distinct<br />

manoeuvres, usually starting and ending with a take-off and landing. Every mission has three<br />

phases, viz:<br />

352


1) Demonstration Phase<br />

The control sequences and instruments required for each mission are first explained and<br />

demonstrated. An ideal mission performance is then displayed on the screens and described in<br />

. pre-recorded standardized form over the acoustic system.<br />

2) Practice Phase<br />

In the second phase, the candidate attempts the manoeuvm himself with assistance both from<br />

. the system (pm-recorded warnings) and from the instructor (optional intervention). The computer<br />

monitors his performance and generates warnings when it strays too far from the optimal .<br />

one. (Tolerances are adjustable.) Should his performance diverge unduly, the manoeuvre is<br />

interrupted and starts anew (up to three times).<br />

3) Test Phase<br />

There is no intervention or assistance during the test phase. The only acoustic inputs, am normal<br />

Controller communications. The same tolerances apply as during the practice phase, and<br />

automatic interruption and restart will occur in the same way.<br />

The candidate’s behaviour is additionally under observation during this phase from a Plight<br />

Psychologist, who subsequently completes an observation log of his performance.<br />

Description of Missions<br />

Mission FPS 01:<br />

- Introduction to the function of the video system, controls and flight instruments.<br />

- Taxiing, Takeoff with Abort, renewed Taxiing to “Number 1 Position”, Take-off and climb<br />

to Pattern-level (1000 ft AGL), Straight and level flight.<br />

- Turns with 20” of bank and 90’ direction change. -Turns with40’ of bank and 180” direction<br />

change. - Turns with 60’ of bank and 360’ direction change.<br />

- Automatic return flight to the airfield with landing.<br />

Mission FPS 02:<br />

- Consists of pattern flying and landings.<br />

Mission FPS.03:<br />

- Take-off and climb to pattern level. Leaving the pattern over the NZP (Navigational Zero<br />

Point) to commence flight proper.<br />

- Navigation flight (1000 ft AGL) with location of targets and the solution of additional tasks<br />

(calculation of course and flight duration per leg). Finally return to airfield and land.<br />

Mission FPS 04:<br />

- Take-off and climb to pattern level. The plane will then be automatically positioned at 6000<br />

ft AGL.<br />

- Recovery from unusual altitudes (nose-up/nose-down). The manoeuvre must be performed<br />

at 5000 ft on a prescribed course and within a given time interval.


- Pursuit of a leading. plane such that a given separation be maintained at all times.<br />

- Homing in on a target, persuit and attack of another plane.<br />

- Finally return to airfield and land.<br />

Mission FPS 05:<br />

An endless tunnel appears on the screen comprising a series of concentric squares and a white<br />

line approaching the viewer through the center of the bottom edges. The squares appear to<br />

approach the viewer by diverging from the centre. The apparent speed of approach of these<br />

squares (which remains constant) simulates the speed of flying through the tunnel. Rotation of<br />

the squares transmits a sensation of banking in the opposite direction. Similarly changes in the<br />

relative displacement of opposite sides (left and right for the tunnel bending, top and bottom for<br />

the tunnel rising and dipping) create effects of the tunnel changing direction and orientation.<br />

These effects communicate themselves to the candidate not as changes in the tunnel however,<br />

but as changes in the attitude of the plane. The alignment of the squares can be restored by the<br />

appropriate control inputs, which in turn restores the impression of level flight.<br />

These effects are accentuated by examination pressure and the feeling of sensory deprivation<br />

caused by a closed cockpit. This is so realistic to some candidates that they experience a sensation<br />

of air-sickness.<br />

Statistical Evaluation<br />

The evaluation of the missions is performed in 3 steps, viz:<br />

1) Data compression<br />

2) Determination ofcorrelations between FPS missions and flight performance in the Screening.<br />

3) Calculation of transformed test results.<br />

Table 1 gives an overview of the number of variables to be processed from each mission.<br />

Table 1: Number of variables per mission<br />

Mission 01: 7 sections of 11 variables.<br />

Mission 02: 11 sections of 11 variables (times 3 circuits).<br />

Mission 03: I8 sections of 11 variables.<br />

Mission 04: 16 sections of 11 variables.<br />

‘\<br />

Mission 05: 9 sections of 9 variables. c-..<br />

‘llis gives a total of 895 processed variables for all 5 missions. This implies ca. 130 kByte raw<br />

data per candidate. Thus a condensation of data is necessary to enable evaluation. (Details of<br />

this condensation procedure are to be found in an exhaustive paper on the subject shortly to be<br />

354


published in the “Wehtpsychologische Untersuchungen”.). The condensation required 60 hours<br />

of processing time, and the results were stored in 7 data banks for later ease of access.<br />

In the second stage of evaluation, correlations were made between the individual variables and<br />

the results from the later Plying screening. In this way the variables best able to predict the<br />

results of the Plying screening were high-lighted.<br />

In the third stage of evaluation, based on a regression analysis of the most predictive variables<br />

from stage 2, a representative value for each candidate was calculated for the individual sections<br />

of each mission, and also for each complete mission (or in the case of mission 02, for each circuit<br />

of the mission).<br />

The validity of the values for each complete mission thus caIculated were correlated with the<br />

results of the Plying screening. All correlations were highly significant, but differed widely<br />

between missions. The fact that the first mission had a lower correlation could perhaps be<br />

explained by the unfamiliarity of the test environment at this early stage of FPS screening.<br />

In a fourth evaluation stage, the fall-out frequencies during Plying screening within groupings<br />

of candidates with similar PPS performances were computed. Table 2 (next page) shows clearly<br />

that candidates with low PPS performances frequently failed the Plying screening.<br />

The second mission appears to have been particularly predictive. Mission 3 and 4 show<br />

irregularities in the middle stages, which could perhaps be explained by the fact that some. of<br />

the skills being tested in these missions, do not play a part in the Flying screening.<br />

Those particularly at risk from the Plying screening are candidates who scored below 51 in the<br />

FPS (most candidates scored in the range 40 to 76). The group of candidates with the best FPS<br />

results ( 69 FPS points) on the other hand, had a 90% success-rate in the Plying screening.<br />

Conclusion<br />

After exhaustive statistical evaluation, it was possible to conclude that the FPS was capable of<br />

predicting success or failure at Plying screening with acceptable accuracy.<br />

A final evaluation of the success of this method of screening (remembering that the full screening<br />

process consists of all five stages in Phases 1 to 3, as at present Plying screening is being retained)<br />

will only be possible after the collection of sufficient statistical evidence of candidates subsequent<br />

performance in training and later operational flying. The same applies, of course, to the<br />

other in-flight disciplines for which PPS screening takes place.<br />

To date, second-rate pilot-candidates have been channclled into positions as Weapon System<br />

Officers and Navigators. It is hoped that the specific results of FPS missions 3 and 4 will show<br />

a better cormIation with subsequent candidate skills in these specialist activities.<br />

355


mission 1<br />

no of cand.<br />

attritions<br />

percentage<br />

mission 2 I)<br />

no of cand.<br />

attritions<br />

percentage<br />

mission 2b<br />

no of cand.<br />

attritions<br />

pcentagc<br />

mission 2c<br />

no of cand.<br />

attritions<br />

percentage<br />

mission 3<br />

no of cand.<br />

attritions<br />

percentage<br />

mission 4<br />

no of cand.<br />

attritions<br />

percentage<br />

mission 5<br />

no of cand.<br />

attritions<br />

percentage<br />

Table 2<br />

FPS 80 test results and attrition rates in Flying screening<br />

< 52<br />

46<br />

23<br />

50%<br />

43<br />

26<br />

60%<br />

44<br />

24<br />

55%<br />

Test results<br />

52-57 58-63 64-69 > 69 total<br />

50 98 120 43 387<br />

11 17 18 1 70<br />

22% 17% 15% 2% 23%<br />

44 71 60 56 274<br />

14 15 5 3 63<br />

32% 21% 8% 5% 2 3 %<br />

33 73 67 57 274<br />

18 8 9 4 63<br />

55% 11% 13% 7% 23%<br />

37 37 63 100 37 274<br />

23 12 17 10 1 63<br />

62% 32% 27% 10% 3% 23%<br />

35 23 31 34 32 155<br />

24 5 2 6 1 38<br />

69% 22% 6% 18% 3% 25%<br />

21 16 24 23 40 124<br />

13 2 3 3 2 23<br />

62% 13% 13% 13% 5% 19%<br />

19 13 25 27 30 114<br />

9 3 4 3 0 19<br />

47% 23% 16% 11% 0% 17%<br />

1) mission 2 consists of 3 identical patterns<br />

_ .


Introduction<br />

Leadership in Aptitude Tests and in Real-Life Situations<br />

A. H. Melter & W. Mentges<br />

Federal Armed Forces Central Personnel Office, K(iln,<br />

Federal Republic of Germany<br />

In the aptitude testing of German volunteers for officer and NC0 careers small groups<br />

of three or four applicants are given planning tasks to work out sequences for action or<br />

to organize items of information. The applicants have to produce an individual draft of<br />

their task solution. They prepare and give the group a short presentation of some aspects<br />

of the planning tasks, and they have to discuss and to decide their tasks and their<br />

individual solutions at a round table.<br />

Officer applicants must organize<br />

- a leisure activity,<br />

- a floor-plan of a supermarket,<br />

-the land utilization and development of a small town,<br />

- a school prize-giving day, or<br />

- a meeting place for young people.<br />

The rating sheet for group tasks is subdivided into four paragraphs:<br />

- The “written plan” section for making notes on the contents, presentation, accuracy,<br />

and lay-out.<br />

- The “short presentation” section for making notes on comprehensibility, behavior, and<br />

argumentation.<br />

- The “round table discussion” section for making notes on social interactions, plans,<br />

decisions,‘and behavior in changing situations.<br />

- The “overall rating of the group task” section for making notes on assertiveness, social<br />

competence and cooperation, argumentation and verbal expression, planning and<br />

decisiveness.<br />

The scale is defined as follows:<br />

1 Very good, obviously positive, clearly more positive characteristics than<br />

usually expected,<br />

2 Good, clearly above average, more positive than negative characteristics;<br />

357


3 Comp!etely satisfactory, somewhat above average, more positive than<br />

negative characteristics;<br />

4 Satisfactory, average, positive and negative characteristics are balanced,<br />

5 Adequate, somewhat below average, more negative than positive<br />

characteristics;<br />

6 Just adequate, clearly below average, more negative than positive<br />

characteristics; _<br />

7 Unsatisfactory, obviously negative, clearly more negative characteristics<br />

than usually expected.<br />

The computer-assisted planning tasks and computer-simulated planning games consist<br />

of a comparable matrix of methods and dimensions, too (Melter & Geilhardt, 1989). As<br />

a rule military raters use the aptitude criteria achievement, social competence and<br />

cooperation, argumentation, planning and decisiveness. These concepts describe a<br />

range of characteristics denotable as leadership in small groups.<br />

The problem of behavior prediction<br />

Now, psychological aptitude researchers and military raters are confronted with the<br />

problem of whether real-life behavior in squads, platoons orcompanies can be predicted<br />

from task-generated behavior in artificial testing conditions.<br />

While the predictor situations are sufficiently described, the criteria referring to careers<br />

and jobs still have to be clarified to solve the prediction problem. Psychological research<br />

normally makes use of analyses of job demands. Such analyses produce the criteria by<br />

which leadership, for example in squads, platoons and companies, can be assessed by<br />

other military personnel (instructors, superiors) and teachers at the officer schools, at<br />

the universities of the German Federal Armed Forces, and in field appointments. If the<br />

predictors and criteria are similar and comparable, the results of such analyses will be<br />

reliable and valid. But if there are great differences between both situations, the<br />

psychological aptitude research unit and the personnel department have to look for the<br />

central personal constructs of the criterion situations. But neither psychologists nor<br />

military users are able to claim td?ave discovered them with hundred per cent reliability.<br />

Use of real-life situations to establish job demands<br />

. One approach to establish career or job demands translatable with psychological<br />

methods in measurements is to issue questionnaires to officers at the officer schools,<br />

the Bundeswehr agencies and in field appointments, In first analyses we used repertory<br />

grid techniques to question 25 military raters and staff officers from the Central<br />

Personnel Office, 17 officers from the Air Force Officer School, and 15 officers from<br />

the Army Officer School about their personal constructs of apt and inapt young officers.<br />

The aim of those studies was to hear the implicit aptitude theories of these officers about<br />

the new officer generation experienced in their own job environment (Mentges, 1989).


A further objective was to produce a diagnostic process model fordetermining and<br />

evaluating the aptitude criteria for selecting officer applicants (Behling & Neubauer,<br />

1990). We intend to question officers in the field, too.<br />

The personal constructs are defined in behavioral terms. However, the method does not<br />

allow to work out unambiguously, in which situation the behaviors defined has what<br />

kind of results, success or failure, for the man concerned. Such distinctions are only<br />

possible if we ask about so-called “behavior - situation - results - triangles” in real-life<br />

environments. This means asking about typical situations, about behavior in such<br />

situations and about the effects of this behavior, for example on the soldiers in the squad<br />

entrusted to the officer candidate for the first time in the training unit. I<br />

When asking about typical situations for leadership we have todifferentiate enormously.<br />

Firstly, the size of the military groups (squad, platoon, company) and the responsibilities<br />

increase during someone’s career.<br />

Secondly, typical situations in peace time, in periods of tension, and in war are different.<br />

Thirdly, we have different typical situations indoors and outdoors. Many further<br />

distinctions are imaginable. It is essential for our problem that while leadership in a<br />

small group will result in success, the same behavior might not be successful in a war<br />

situation. In threatening situations where prompt, precise, and right action is necessary,<br />

there is a need for different leadership qualities from those in situations where there is<br />

no stress (Cardoso de Sousa, 1990).<br />

Predictions for normal and dangerous situations<br />

All military experts concerned with such topics assume that they are unable to reliably<br />

predict leadership in war or to predict the character of that type of officer who would<br />

in fact be able to lead successfully in war simply because the speed, variety, and<br />

unforeseeability of events and behaviors in such crucial circumstances are beyond<br />

precise description and simulation (Oetting, 1988).<br />

On the other hand, there is some evidence that people with a certain pattern of basic<br />

abilities will most probably be unable to hold their own in typical situations. For the<br />

moment, we have left out of consideration the fact that a certain pattern of skills and<br />

knowledge can be generated by training and education.<br />

The psychological and medical assessments of such basic patterns conjoined with the<br />

prediction of success in typical situations are difficult enough, but the educational<br />

assessment of the increase achieved by training and education is incomparably more<br />

complicated.<br />

Let us take an example out of the domain of survival, The analyses of reports given by<br />

survivors of accidents have shown that<br />

- their belief in being rescued,<br />

-the fact that they did not panic,<br />

- their good morale, and<br />

-their will to survive,


each demonstrated in behavior, enhanced their chances of survival (Riider & Minich,<br />

1987).<br />

Of these four psychological characteristics, only morale and will-power can perhaps be<br />

detected in a basic assessment of volunteers. How can we assess whether morale and<br />

will-power of soldiers can be increased to such an extent through training and education<br />

so that they could survive dangerous situations. It is extremely difficult to predict such<br />

an “ultimate” criterion. And because of that we are unable to base a selection and<br />

placement model on aptitude criteria for extreme situations.<br />

It is by no means so that the prediction for “normal” situations are considerably less<br />

difficult than the predictions for dangerous situations. You only have to think of the<br />

quite “normal” prediction of the superior’s ratings at the end of any military training<br />

course, and of the many imponderable factors that can influence the aptitude and<br />

performance rating of an officer candidate.<br />

The environmental factors accompanying military operations, space missions, rescue<br />

operations, or sports activities can - as dangers - drastically affect the behavior of<br />

individuals concerned and have consequences for the life and limb of both those in<br />

charge and their teams, Although predictions are very difficult, psychologists remain<br />

under an obligation for ethical reasons to contribute to predictions by researching into<br />

aptitude criteria and the characteristics of poorer performance and performance enhancement<br />

due to training, in order to improve the selection, the training, and mission<br />

accomplishment with psychological methods.<br />

One example from the domain of sports activities serves as clarification: when dangerous<br />

situations in mountaineering have been analyzed retrospectively from a psychological<br />

view point, it has been noticed that some behavioral characteristics of the men at<br />

risk brought about the potential accidents of guided groups:<br />

- careless and technically deficient safety measures;<br />

- failure to give precise orders, if any at all;<br />

- unrealistic over-estimation of one’s technical skills and fitness;<br />

- euphoria or fatigue combined with decreasing attention;<br />

- arguments and annoyance.<br />

Accidents happen with increasing probability if such behavioral characteristics appear<br />

in the group, and if environmental factors interact in a fateful manner: The guide climbs<br />

a rock passage with crumbling grips and steps; the second member of the group fails to<br />

take adequate securing measures and at the same time chatters to the third member of<br />

the group without observing the guide, who for his part fails to give precise and pressing<br />

instructions to the group to do things right.<br />

The behavioral result of the leader may be a fall, if the environmental factor “loose grip”<br />

comes to bear, a fall which could mean the fall of the whole group because of the<br />

incomplete and inattentive securing, with fatal consequences for all the members. The<br />

guide should be advised to pay attention to the reliability of the members when selecting<br />

. ..~ -I. .- -. _- ._<br />

360


his group, to insist on a short check of their communication and securing skills, and to<br />

attach importance to precise and prompt instructions during the climb.<br />

Results of previous job analyses<br />

Which criteria resulting on the one hand from surveys and on the other from real-life<br />

situations can be provided by aptitude psychologists for a basic assessment in order to<br />

get concepts and measurements of leadership in small groups of aptitude testing?<br />

Surveys with officers from different divisions of the Central Personnel Office (Mentges,<br />

1990) point in a very definite direction that can be paraphrased with<br />

-personal authority and the attending executive techniques,<br />

- assertiveness, taking consideration of the situation and of the people involved,<br />

- cooperation in the sense of commitment to the success of the team,<br />

- comradeship and carei<br />

- courageous and honest acceptance of responsibility.<br />

Anyhow, it does not include the ability to cause conflicts and to test the extent to which<br />

such conflicts can be endured and managed. Ideas of this kind should have been<br />

discharged once and for all from modem group psychology.<br />

References<br />

Behiing, A. & Neubauer, R. (1990). Eignungsmerkmale Offiizierbewerber. (Aptitude criteria for officer<br />

applicants). AbschluBbericht der Industrieanlagen-Betebsgesellsehaft mbH. Ottobrunn.<br />

Cardoso de Sousa, FJ.V. (1990). Leadership under stress: Immediate effects of the aggressive style. Paper<br />

presented at the I.A.M.P.S. conference. Vienna.<br />

Melter, A.H. & Geilhardt, T. (1989). Computer-assisted problem solving as assessment method. Proceedings<br />

of the 31st annual conference of the <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong> (pp. 129-134). San Antonio:<br />

Air Force Human Resources Laboratory.<br />

Mentges, W. (1989). ImpliziteEignungstheorien als Bestandteil der Anforderungsanalyse im Assessment<br />

Center. (Implicit aptitude theories as part of job analyses for assessment centers). Diploma&it im<br />

Fach Psychologie an der Philosophischen Fakultit der Rheinischen Friedrich-Wilhelm.+Universitit<br />

Bonn.<br />

Mentges, W. (1990). Die Erhebung von impliziten Eignungstheorien als Beitrag zur Anfor&rungsanalyse<br />

filr den Offizierberuf. (Survey of implicit aptitude theories as a contribution to job analysis). Koln:<br />

Arbeitsbericht des Personalstammamtes der Bundeswehr.<br />

Oetting, D.W. (1988). Motivation und Gefechtswert - Vom Verhaltcn des Soldaten im Kriege. (Motivation<br />

and combat effectiveness - On the behavior of soldiers in war). Frankfurt und Bonn: Report<br />

Verlag.<br />

R&ler, K.-H. & Minich, I. (1987). Psychologie des Uberlebens - Survival beginnt im Kopf. (The<br />

psychology of survival - Survival begins in the mind.) Stuttgart: Pietsch.<br />

361


Computer-based Assessment of Strategies in Dynamic Decision Making<br />

1. Introduction<br />

Wiebke Putz-Osterloh<br />

University of Bayreuth<br />

In psychological testing, computers are used primarily as economically efficient tools to<br />

administer tests and to analyze and store individual data. This type of testing based on classical<br />

tests is not the subject of this paper, however. Instead, I intend to speak about the uses of<br />

computer programmes to simulate complex situations that call for dynamic decision making<br />

(Kleinmuntz, 1985) or for complex problem solving as Diimer defines it (1978). In the<br />

following, I will first discuss some reasons for complementing classical tests of intelligence<br />

by other methods to extend the range of intellectual demands. Secondly, I will mention three<br />

conditions that should be controlled if one intends to assess individual differences in complex<br />

situations. Then I will summarize empirical results concerning individual differences in<br />

problem solving strategies. Finally I will discuss some difficulties encountered in estimating<br />

the external validity of strategies.<br />

2. Reasons for expanding approaches to intelligence testing<br />

Classical tests of intelligence (whether computer-based or conventional paper-and-pencil)<br />

suffer from some common restrictions with respect to the intellectual demands they cover:<br />

- Test items are static: Items have to be answered independent of the answers given previously<br />

or to be given later.<br />

- Test items are transparent and well defined: Individual differences in knowledge used and<br />

strategies applied must be eliminated to make sure that only one single solution to each item<br />

can be evaluated as the correct one.<br />

- Intelligence is measured by the sum of the correct solutions to items that are to be solved as<br />

quickly as possible: Time consuming processes such as the use of heuristic strategies are not<br />

analyzable.<br />

-Answers to test items have to be selected rather than constructed: Although in real-life<br />

situations the rule is that one has to search for decision alternatives first and to select one of<br />

these afterwards, such search processes are excluded from test-intelligence.<br />

- Applicants often do not accept tests of intelligence as valid or fair predictors for personnel<br />

selection. One approach to overcome the restrictions mentioned is to assess individual<br />

behavior in multiple “real-life situations and exercises” as it is conceptualized by assessment<br />

ten ter methods.<br />

3. Conditions for the assessment of individual differences in decision making strategies<br />

The following conditions are not met or even controlled when using assessment center methods:<br />

- In complex situations there are multiple goals to be reached. Individual differences in decision<br />

making depend on specific defined goals. If individual behavior is to be assessed, the goals<br />

of each subject have to be controlled; otherwise the effectiveness ratings of individual<br />

I....__ _ ,.<br />

362


ehavior will be invalid. This condition is violated in unstandardized group discussions and<br />

in role-taking games.<br />

-In complex situations different strategies are possible, leading to different outcomes. Therefore,<br />

data on strategies should give more information about individual differences than<br />

performance scores alone would; otherwise the analysis of performance alone would suffice.<br />

The use of assessment center data on strategies and performance are interrelated.<br />

- Analyses of strategies are most informative if the data are not predicted by classical tests or<br />

other performance scores. They are useful if they are generalizable to other situations.<br />

4. Empirical studies using simulated dynamic situations*<br />

4.1 The simulated situations and their demands<br />

In our empirical studies two different simulated situations are used. The first system simulates<br />

a small industrial company which produces and sells textiles. The system consists of 24<br />

variables, of which 11 input-variables can be changed directly by decisions made by the<br />

subjects, including the volume of raw materials to be bought, the selling prices, the amount of<br />

advertizing, the number of workers, etc. Subjects are asked to aim at three goals while<br />

controlling the system:<br />

-to make as much profit as possible,<br />

-to increase the company’s capital from beginning to end,<br />

-to pay the workers the highest wages possible.<br />

These three goal variables are used in combination to rate the performance in system control.<br />

The subjects are asked to control the system for 15 simulated months and to decide what<br />

changes should be made in what input variables. As the experimenter operates the computer,<br />

the subjects have to ask questions about the actual state of the variables, and to communicate<br />

their decisions to the experimenter. So, while the subjects are thinking aloud, data can be gained<br />

in quite a natural manner. _<br />

The second system simulates a forest region that is to be protected against fire. The subjects<br />

are asked to take on the role of a fire chief, giving different commands to 12 fire fighting units<br />

(using a mouse). The forest, the units, and the fires are displayed on a graphics terminal in front<br />

of the subjects. The goal is to minimize the area that is burned down (see also Brehmer, 1987).<br />

Again, the same criterion is used to rate performance. The system is to be controlled for 100<br />

time intervals which maximally last one minute each.<br />

Despite differences in the mode of control the two systems have the following demands in<br />

common:<br />

* This research is supported by grants from the Federal Ministry of Defense in Bonn, Federal<br />

Republic of Germany<br />

363<br />

. .


(a) The systems are complex: This means that they contain many variables that are interconnected<br />

by a relational network rather than by single unidirectional relations. Given one input<br />

change, the network between the variables causes not only a main effect but several side effects<br />

that also have to be taken into consideration.<br />

(b) The systems are nontransparent: This means that the relational network connecting the<br />

variables one to another is not shown to the subjects. Therefore the subjects have to generate<br />

hypotheses about the effects of their decisions, which they should then test against the feedback<br />

data.<br />

(c) The systems are dynamic: This means that the variables change their state over time,.even<br />

if there is no input change. As a consequence, the effects of input changes differ depending on<br />

the actual system states.<br />

(d) The systems are meaningful: This means that the variables and their interrelations are<br />

implemented in a system to correspond to a domain of reality. The subjects can use their<br />

domain-related knowledge to generate hypotheses.<br />

4.2 Control strategies and derived measures<br />

Due to the differences between these demands and the demands of test items, it is to be expected,<br />

and is substantiated by empirical data, that performance in system control is not predictable by<br />

intelligence test scores (see Dijrner & Kreuzig, 1983; Darner, 1986; Funke, 1983; Putz-Osterloh,<br />

1981).<br />

As DSrner (1986) argues, strategies in system control are determined by a superordinate type<br />

of intelligence, the so-called “operative intelligence”. This type of intelligence refers to the<br />

construction and adaptive use of, and control over, subordinate processes such as information<br />

gathering, hypothesis testing, planning, and decision making.<br />

Different parameters of individual strategies are combined to evaluate the control over<br />

subordinate processes, e.g. the frequency of correct verbalized hypotheses (correspondent to<br />

system reality) and the rareness of false or irrelevant ones. These parameters are analyzed and<br />

summed up over subsequent time intervals to evaluate control and adaptation over time.<br />

In the following, two examples of complex abilities which can be diagnosed from decisionmaking<br />

and from thinking-aloud data are defined and operationalized.<br />

Ability to organize<br />

High organizing ability is defined by the frequency of prospective decisions to prevent<br />

undesirable system states, the rareness of false decisions, and the coordination of different<br />

decisions to reach more than one goal.<br />

In the economic system, the following parameters are combined: the rareness of isolated<br />

decisions, the frequency of central decisions which directly influence one goal variable, and<br />

the frequency of coordinated (in relation to the goal variables) decision patterns over time.<br />

364


In the fire fighting system prevention is realized by the number of units distributed over the<br />

area before a fire is seen. False decisions mean forgetting to let the units search for and put out<br />

fires by themselves.<br />

Coordination is measured simply by the number of changing commands in the face of new<br />

fires throughout the game.<br />

Ability to decide<br />

A high degree of decision-making ability means the capability to plan in a goal-directed manner<br />

and to realize decisions quickly and precisely.<br />

The following aspects are combined in the economic system: The time to control the system,<br />

the frequency of postulated correct effects of decisions, and the rareness of decisions that do<br />

not work in the system.<br />

In the fire fighting system, the speed and accuracy of decision-making are rated in combination.<br />

This means that the number of new fires that are dealt with in precise commands are summed<br />

up and weighted by the average time lag between the time of the fire and the time that the<br />

corresponding command is given.<br />

4.3 Empirical results<br />

4.3.1 Estimates of reliability<br />

Concerning the economic system retests between different trials of system control do not seem<br />

to be appropriate. Here we can expect content related changes in strategies that may influence<br />

performance without being attributable to a lack of reliability. Empirical results from two<br />

studies show stability of medium-level strategies, either accompanied or not accompanied by<br />

stability in performance (see,Strohschneider, 1986; Funke, 1983).<br />

In the fue fighting system, content-dependent changes in strategies are not to be expected. In<br />

one experimental study (N = 50 university students) two versions of system parameters were<br />

constructed which differed in the number and timing of new fires. The subjects had to control<br />

each version for three trials in one session each. The correlations are lower between the first<br />

set of trials than between the second set. Between the last two trials in the second version, all<br />

correlations are higher than .80, referring to performance as well as to organizing and<br />

decision-making ability. In a second study (N = 80 university students), one system version<br />

had to be controlled for four trials. Performance data between the last two trials are correlated<br />

.84, whereas organizing and decision-making ability is correlated .79 and .76, respectively.<br />

These data on stability are accompanied by significant gains in performance as well as in<br />

strategies from the first to the last trial. This is equally true for both studies.<br />

4.3.2 Data on internal validity<br />

As has been mentioned above, it should be tested whether the subjects are aiming at comparable<br />

goals. If the goals are only vaguely defined, the subjects will probably define different specific<br />

goals for themselves. Consequently, in our studies the subjects are given specific goal variables<br />

. .


“Fzw%mmr-.-.‘~ .<br />

which should be influenced in a specified direction. The objectively defined performance is<br />

correlated with subjectively rated success after system control. In two studies all correlations<br />

are highly significant: For the economic system the correlation is .52 (N = 100) and .48 (N =<br />

48), and for the fire system it is .75 (N = 50) and .79 (N = 80).<br />

An important characteristic of performance in system control refers to its partially ambiguous<br />

meaning as performance level is not equivalent to a specific strategic variant. Following these<br />

arguments, the internal validation of identified strategies should be the proof that these<br />

differences are systematically related to performance level, whereas different strategic measures<br />

shouId not be correlated too highly. In our studies there is clear evidence that differences<br />

in strategies are systematically related to performance. Between studies there are differences<br />

in the amount of common variance. In the economic system, decision-making and organizing<br />

ability is correlated positively with performance, but this is not always significant (decisiveness:<br />

.28 and .20; organizing ability .50 and .13; N = 48, N = 100). For fire fighting the<br />

correlations are higher: decision-making ability with performance .58 (N = 50) and .45 (N =<br />

80); organizing ability .58 (N = 50) and .63 (N = SO).<br />

Further questions aim at the relation between the two strategic measures. In the economic<br />

system, no systematic correlation between decision-making and organizing ability is found (N<br />

= 48, N = loo), whereas in fire fighting there is either no relationship at all or no more than<br />

9% common variance between the two measures (N = 80; N = 48). Finally, the generalizability<br />

of strategies and performance between the two systems was also tested. In two independent<br />

studies, one group of subjects first controlled the fire system and then the economic system,<br />

while the other group worked the systems in the reverse order. A sequence effect is replicated<br />

in the two studies: If the subjects control the fire system first and the economic system<br />

afterwards, differences in performance as well as in organizing ability are correlated systematically<br />

between the two systems (N = 25; N = 30), whereas no systematic correlations are<br />

found if the systems are controlled in the reverse order. Despite some ambiguities in interpreting<br />

the sequence effect, these data give evidence of the generalizability of strategies in system<br />

control.<br />

4.3.3 Dara on exrernal validity<br />

If the systems do represent valid dynamic situations, experts in the simulated domain of reahty<br />

should do’ better in controlling a system than novices do. As is to be expected, in two<br />

independent studies (Putz-Osterloh, 1987; Putz-Osterloh & Lemme, 1987), university professors<br />

(N = 7) and selected postgraduate students in management science (N’= 22) systematically<br />

used more efficient strategies and achieved better performance scores in controlling the<br />

economic system than unselected students (N = 29) did. For the latter subjects, the intelligence<br />

test scores were controlled; they are not correlated with success in system control.<br />

Following the logic of this expert-novice paradigm, in a further study, field-grade officers<br />

(participants in a command and staff course) (N = 27) were compared with unselected students<br />

(N = 30) in controlling the fire system.<br />

Against expectations, no systematic differences between the two groups were found. DO these<br />

negative results falsify a possible external validity of the system to predict success in higherlevel<br />

military careers? There are two arguments that make me inclined to respond to this<br />

question with a negative answer. First, the military subjects are not homogeneous with respect<br />

. -. 366


. .<br />

. -.-- -.-I<br />

to their decision-making behavior. Instead, in some parameters they show greater variance than<br />

students do. Second, some military subjects reported after the tests that they used the commands<br />

in accordance with their specific military education, and that use of such knowledge hinders a<br />

successful control. In contrast to this, other subjects did learn the specific conditions implemented<br />

in the fire system, and they did well. These data shed light on the different demands<br />

of the fm system, depending on the specific knowledge used while controlling it. Further<br />

investigations are needed to specify system demands and the strategies required to deal with<br />

them successfully.<br />

5. Conclusions<br />

(1) There are individual differences in intellectual abilities that are not covered by the usual<br />

intelligence tests. These differences may be of significance for personnel selection.<br />

(2) There are strategic differences in system control that are related to performance; they are<br />

reliable if the subjects are allowed to control a system in repeated trials.<br />

(3) Simulated systems realize compIex demands that are standardized and replicable. Therefore,<br />

systems offer great advantages over standardized group situations.<br />

(4) Besides some evidence of the external validity of strategies and performance in system<br />

control further theoretical and empirical work needs to be done to specify the demands of real<br />

life situations and their correspondences with system demands.<br />

(5) Far from being able to predict precisely what strategies in system control imply for behavior<br />

in real life situations, I consider the reported approach to be worth further pursuit.<br />

References<br />

Brehmer, B. (1987). Development of mental models for decision in technological systems. In J. Rasmussen, K.<br />

Duncan, & J. Leplat (Ed%), New Technology and Human Error (pp. 11 l-142). Chichester: Wiley.<br />

Domer, D. (1986). Diagnostik der operativen Intelligenz. Diagnostica, 32,290-308.<br />

Domer, D. & Kreuzig, H.W. (1983). Problemlosefahigkeit und Intelligenz. Psychologische Rundschuu, 34,<br />

185-192.<br />

Domer, D. & Reither, F. (1978). iiber das Problem&en in sehr komplexen Realititsbereichen. Zeitschrifr fiir<br />

experimentelle und angewandte Psychologie, 25,527-551.<br />

Funke, J. (1983). Einige Bemerkungen zu Problemen der Problemliiseforschung oder: 1st Testintelligenz doch ein<br />

Pradiktor? Diagnostica, 29,283-302.<br />

Kleinmuntz, D.N. (1985). Cognitive heuristics and feedback in a dynamic decision environment. Management<br />

Science, 31,680-702.<br />

Putz-Osterloh, W. (1981). Ijber die Beziehung zwischen Testintelligenz und Problemlbseerfolg. Zeitschrijifiir<br />

Psychologie, 189,79-100.<br />

Putz-Gsterloh, W. (1987). Gibt es Experten fiir komplexe Probleme? Zeitschriftflir Psychologie, 19.5,63-84.<br />

Putz-Gsterloh, W. & Lemme, M. (1987). Knowledge and its intelligent application to problem solving. The<br />

German Journal of Psychology, II, 286-303.<br />

Strohschneider, S. (1986). Zur Stabilitit und Valid&t von Handeln in komplexen RealiBtsbereichen. Spruche &<br />

Kognition, 5,4248.<br />

367


A special Approach in Assessment- based Personnel Selection<br />

G. Rode1<br />

German Naval Volunteer Recruiting Centre<br />

Wilhelmshaven, Federal Republic of Germany<br />

Introduction:<br />

Due to the lower birthrates in the past and the politically based effects in the present, the FRG Artied<br />

Forces have to deal with shrinking numbers of volunteers. The German Navy ’ s efforts to exploit personnel<br />

resources is especially focusing attention on draftees.<br />

To become a temporary career volunteer in the Federal German Navy there are three different ways of<br />

enlistmenl:<br />

The first way involves civilian volunteers applying to the Naval Volunteer Recruiting Centre (NVRC),<br />

where their aptitude for a temporary-career enlistment is tested (selection) prior to their placement in<br />

the Navy, 65% of all temporary-career volunteers enter the Navy using this way via the NVRC.<br />

The second way is the recruitment of conscripts serving in field units. About 10% of the temporarycareer<br />

volunteers are recruited in lhis way.<br />

The third possibility of becoming a temporary-career volunteer in the Navy is through the.so-called<br />

“Information and counseling campaign” (IBA). I would now like to give a more detailed account of this<br />

model of recruitment.<br />

The IBA completes the quarterly temporary-career volunteer requirements which have not been met<br />

by NVRC and at troops level. If the NVRC enlists a great number of volunteers for a specific quarter,<br />

the complementary recruitment requirements to be met by IBA are correspondingly smaller. Thus, the<br />

number of volunteers that have to be recruited by IBA are subject to fluctuations. Usually, the percentage<br />

lies between 25 and 35 of the total need of temporary-career volunteers.<br />

In this context I would like to give you some figures underlying the importance of the IBA for the German<br />

Navy:<br />

The Navy has a strength of approximately 29.000 soldiers (without officers), of whom 3000are in their<br />

basic military service, 16.000 are temporary-career volunteers and 7.800 are regulars.<br />

Every year, about 1,000 soldiers are recruited by IBA as temporary-career volunteers. This amounts to<br />

a quarter of the annual requirements.<br />

The military training system of the Federal German Navy presents great advantages for the realisation<br />

of such a recruitment campaign. Training is provided centrally at only nine training centres - so-called<br />

schools -, and the maximum distance between these centres beeing 500 kilometres. All Navy training<br />

courses are held at these training centres. These training courses permit us a focused approach to all<br />

students for the purpose of recruitment, examination and placement.<br />

Another advantage is the central personnel management in the Navy under the responsibility of the<br />

Navy Enlisted Personnel Office, which keeps us informed about the specific requirements of the Navy<br />

for every quarter. In this way, we can steer the applicants for a placement in specific tasks or jobs.<br />

368


In the Federal Armed Forces, this campaign is unique, and feasible only in the Navy for the reasons I<br />

have explained earlier.<br />

The system of NVRC selecting personnel from the field units for extended military service in the Navy<br />

has already existed for more than 22 years. However, until three years ago, the field units had not been<br />

involved directly in this selection procedure. This task had been performed exclusively by NVRC.<br />

That means that the field units were not sufficiently concerned about recruitment, counseling and<br />

selection of new personnel, leaving this task to other navy institutions such as the Navy Enlisted Personnel<br />

Office, the Naval Office and the NVRC.<br />

This campaign is of particular importance especially now, in a period marked by a drop in personnel<br />

owing to age groups with declining birth rates and to a lack in motivation and insight in the necessity of<br />

armies in the face of the detente in West-East relations. This new procedure should lead to an active<br />

participation of superiors in filed units as multipliers in this process of enlisting, counseling and recruiting.<br />

Recruits are approached for a temporary-career enlistment already in the second month of their<br />

basic training.<br />

As the readiness for volunteering for a temporary-career enlistment is the greatest especially during<br />

the fist four months of basic military service it is absolutely necessary to conduct IBA during this<br />

period.<br />

Therefore, testing takes place in situ at the basic training unit.<br />

Method:<br />

During the first phase of IBA, officers go to the nine basic training garrisons every quarter year in order<br />

to recruit (advertise), volunteers for enlistment in the Navy by means of film, lectures and counseling,<br />

and to inform them about military and vocational possibilities.<br />

During the second uhase, psychologists go to the different garrisons two weeks later in order to examine<br />

the recruits who, during the first phase, have shown an interest in a temporary-career enlistment.<br />

Under the stipulations of the new procedure governing the recruitment of the suitable personnel for<br />

the forces, the task has to be performed jointly by the NVRC and the forces.<br />

The NVRC psychologists have been entrusted with this task for reasons of ensuring the application of<br />

uniform standards to the evaluation of applicants with or without prior service concerning their aptitude<br />

for a temporary-career enlistment and because of the fact that these psychologists have many<br />

years’ experience in personnel selection testing.<br />

The psychologist as well as the superior in the unit are directly and equally involved in the responsibility<br />

for the recruitment of personnel.<br />

By including the forces, this new methodology also takes into account the fact that the validity, i.e. the<br />

quality of a statement on the aptitude of a person increases considerably if the persons concerned is<br />

evaluated separately and independently as compared to those cases when observation, examination<br />

and decision are made jointly and simultaneously.<br />

For an evaluation of the applicant during the psycho-diagnostic interview, the following documents are<br />

available to the psychologist:<br />

369


Medical certificate (exclusions from certain assignments)<br />

Aptitude test results<br />

General application documents<br />

School reports<br />

Testimonials<br />

curriculum vitae<br />

First the new superior will be initiated in the procedure and trained as a rater by the psychologist. Under<br />

this personal responsibility and independently he will observe, judge and evaluate the military conduct<br />

of the applicant’s military qualification for a temporary-career enlistment.<br />

The superior’s aptitude statement must have been completed independently before the psychologist<br />

starts the aptitude test based on following documents:<br />

1.<br />

2.<br />

3.<br />

4.<br />

Application documents.<br />

School reports and reports of professional performance.<br />

Declaration of pending proceedings and financial liabilities.<br />

Section and platoon leaders make contributions to an efficiency assessment based on their observations,<br />

judgements and evaluations of the military conduct in the following military areas<br />

of activity:<br />

In general and specialized instruction<br />

in practical technical training<br />

in hand weapon training<br />

during drills<br />

in physical training<br />

in march training<br />

in field training<br />

Based on the contributions to an efficiency assessment and on his own conclusions from a personal interview<br />

with the applicant, the superior has to evaluate the following aptitude characteristics:<br />

- devotion to duty<br />

- comradeship<br />

- technical abilities<br />

- self assertion<br />

From the documents and the results of the psycho-diagnostic interview, the psychologist evaluates the<br />

aptitude characteristics:<br />

- initiative<br />

- motivation to perform<br />

- articulateness (verbal comprehension and expression)<br />

- judgement<br />

The characteristics “sense of responsibility” and “performance under stress” have to be judged by both<br />

the psychologist and the superior.<br />

Four gradations are the superior’s and the psychologist’s disposal for their recommendations:<br />

After psychologist and the superior have made their evaluations independently, this commission<br />

prepares a joint decision on acceptance or rejection of the applicant for a temporary-career enlistment.<br />

Then it is the psychologist task to determine a suitable placement for the applicant and to discuss it in<br />

detail.<br />

370<br />

.


Evaluation of the counseling nrocedure<br />

For the time beeing, a long-term investigation concerning the validity is not yet available as the new<br />

procedure exits only three years and soldiers have not held their posts in the units long enough to see<br />

whether the counseling procedure has proved its worth.<br />

However, a comparison of the different recruitment procedures of the NVRC and the IBA already<br />

permits a stalement on the quality of the new procedure. In this case, NC0 training course results obtained<br />

by soldiers that have been recruited for the Navy by NVRC and those recruited through IBA<br />

can be compared.<br />

It was to be expected that the results of the training course would not differ significantly as the<br />

psychologists involved are the same in both cases, and they can make use of their many years’ experience<br />

of test methodology.<br />

However the results also confirm the application of uniform standards to both procedures.<br />

A further confirmation of the new IBA procedure is the opinion poll about the personal involvement<br />

and the acceptance of the new procedure of the IBA has the following results:<br />

Out of 84 superiors only 3 officers had a negative or indifferent opinion about the new way IBA is<br />

practised now.<br />

The absolute figures of the results of the last years to be compared makes no sense because of the<br />

decreasing number of applicant for volunteering in the Navy. But if you look at the relative frequencies<br />

the new procedure succeeded. The percentage of the enlistment rate was in 1986 about 77.8 %. This<br />

rate increased 1989 up to 90.4 %. The difference is statistically not significant. It means however to the<br />

Navy that the absolute number of enlistments is nearly constant in the last years, though the total numbers<br />

of applicants is reduced.<br />

Conclusions:<br />

I cannot judge whether this procedure designed to recruit suitable personnel for the Navy can also be<br />

transferred to other Navies. According to the Naval Staff, this procedure has proved its worth in the<br />

German Navy. The forces and their superiors themselves feel that they are more actively involved in<br />

the process of recruiting and therefore they intensify their counseling efforts for individual applicants<br />

thus acting as multipliers.<br />

371<br />

.


_.-.- - . . -.-_--_ . .- .~<br />

--...T<br />

TROUBLESHOOTING ASSESSMENT AND ENHANCEMENT (TAE) PROGRAM:<br />

TEST AND EVALUATION RESULTS *<br />

Paper Presented by Dr. Harry B. Conner,<br />

Navy Personnel Research and Development Center, San Diego, CA 921528800<br />

32nd ANNUAL MILITARY TEST& ASSOCIATION CONFERENCE<br />

November f&9,1990, Orange Beach, Alabama<br />

Nauta Rack- (1984) reported on a number of difficulties associated with the U.S. Navy’s ability to<br />

maintain fts weapons systems. He reported the costs of poor performance of maintenance personnel and<br />

recommended areas re uiring investigation if performance of these personnel was to improve. At about the<br />

same time at the Navy 8 ersonnel Research and Development Center (NPRDC), we determined that one of<br />

the difficulties we had encountered in the test and evaluation of an ongoing project (the Enlisted Personnel<br />

Individualized Career System-EPICS) was we had no way of comparing maintenance personnel in the most<br />

important aspect of their erformance: troubleshooting of the hardware system. We realized that we needed<br />

an objective way to eva P uate personnel performance in the skill of troubleshooting. A, literaturessearch<br />

supported the contention that most research and development efforts in this area start with a premtse of a<br />

known expert, journeyman/master, or experienced troubleshooter when in fact these are defined rather than<br />

empirically determined. Therefore, we concluded that efforts to improve maintenance personnel<br />

troubleshooting performance were futile until we could empirically and objectively define how a good<br />

troubleshooter performs.<br />

Aoproach. We addressed this evaluation issue first with a feasibility study (Conner 1988, 1987) followed<br />

by a more structured investigation the Troubleshooting Assessment and Enhancement (TAE) program. The<br />

TAE objective was to design, develop, test, and evaluate a low cost troubleshooting evaluation capability.<br />

The model (Figure 1) we used in our investigation shows that maintenance is just one of a number*of<br />

activities associated with a hardware system. Within the area of maintenance, one can perform preventattve<br />

or corrective maintenance. Within corrective maintenance, one troubleshoots or repairs. Specifically, we<br />

focused on the skill of troubleshooting, which we considered to be a skill of problem solving requiring abstract<br />

conceptualization capabilities.<br />

HARDWARE SYSTEM INTERACTfONS<br />

I I I<br />

CONSTRUCT INSTALL OPERATE MAINTAIN<br />

I 1<br />

PREVENTIVE-- . CORRECTIVE<br />

. .._____.... ^_<br />

MAINTENANCE MAlNTtNANUt<br />

I<br />

I I<br />

+ TROUBLiSHOOTlNG REPAIR<br />

Figure 1. Hardware Activity to Troubleshooting<br />

With 25 subject matter experts, we developed a list of factors to be used to evaluate the proficiency of a<br />

troubleshooting technician in a high tech environment; that is, systems having state-of-the-art electronics and<br />

computers requiring troubleshooting. Next, we sent our initial factors list with definitions (shown in Table 1)<br />

to 1200 operational hi-tech personnel for ranking. The results were then weighted by a jury of experts (on the<br />

system under investigation). Once the factors were weighted, a scoring methodology was developed. Table<br />

2 provides the results of the factor development, weighting, and TAE scoring scheme. Our literature search<br />

caused us to add a tenth factor: redundant checks.<br />

::<br />

3.<br />

4.<br />

2<br />

i:<br />

9.<br />

10.<br />

Rank Factor<br />

Solution.<br />

Cost (Incorrect Solutions).<br />

Time.<br />

Proof Points.<br />

Illogical Approaches.<br />

Invalid Checks.<br />

Out-of-Bounds.<br />

Test Points.<br />

Checks.<br />

Redundant Checks.<br />

TABLE 1. Factor Definitions<br />

-<br />

Definition<br />

Problem is correctly solved; fault is identified.<br />

Number of Lowest Replaceable Units (LRUs) incorrectly identified as faulty.<br />

Total minutes from login to logout taken to find the fault.<br />

Test points that positively identify LRUs as faulty.<br />

Inappropriate equipment selection.<br />

Inappropriate test at appropriate test point.<br />

inappropriate test point was selected.<br />

Total number of valid reference designator tests.<br />

Total number of tests performed at all test points.<br />

Same test performed at same point during the episode. - - - -<br />

� The opinions expressed in this paper are (hose of the author, are not official and da not necessarily reflect Ihe vim+s Of the Navy mPadme”t<br />

372<br />

.<br />

/


TABLE 2. Ranking, Weighting, and Scoring for Troubleshooting Evaluation Factors<br />

Rank Factor Weight Scoring Scale<br />

(Max Points)<br />

Scoring<br />

(Per event)<br />

1<br />

2<br />

3<br />

4<br />

Solution<br />

Cost (Incorrect Sol)<br />

Time<br />

Proof Points<br />

42.78<br />

13.13<br />

11.80<br />

9.88<br />

‘EZ<br />

20:62<br />

17.23<br />

-100 For Fail to find<br />

-0.5 X ea NFR LRU<br />

-0.5 X ea Minute<br />

- % X ea Proof Pt missed<br />

i<br />

Illogical<br />

Invalid<br />

Approach<br />

Checks<br />

6.87<br />

4.68<br />

12.01<br />

8.18<br />

-6.0 X ea Illogical App<br />

-0.8 X ea Invalid Check<br />

5<br />

Out of Sounds<br />

Tests Points<br />

4.00<br />

3.21<br />

8.99<br />

5.61<br />

-0.6 X ea Out of Sounds<br />

4.5 X # of Tests<br />

Q<br />

10<br />

Checks<br />

Redundant Checks<br />

3.08<br />

tbd<br />

5.38<br />

tbd<br />

-0.5 X # of Checks<br />

to be analyzed<br />

Scoring is designed to discriminate between levels of troubleshooting proficiency: failure to solve the<br />

roblem results in a score of 0, while solving the problem results in a score of 100. There is no partial score<br />

Por factor 1. Ability to discriminate between levels of troubleshooting proficiency is in scoring of the remaining<br />

factors. Wei hts for the factors were converted into a scale equaling 100 points. The final score for each<br />

subject equa Bs<br />

100 points minus the sum of points lost for each factor. The minimum score is 0: that is, no<br />

ne atfve scores. The scoring criteria for each factor, also shown in Table 2, are the wei hts that were used<br />

In tfe TAE epfsodes to evaluate and diagnose troubleshooting proficiency levels. The cost9actor<br />

was changed<br />

to incorrect solutions to more accurately describe the actual behavior.<br />

Once we had determined factors and scoring scheme, we selected and constructed practical<br />

troubleshooting episodes that provided a valid representation tionof the hardware system being used in the<br />

study. Our hardware system was the U.S. Navy s communications system, the Navy Modular Automated<br />

Communications System/Satellite Communications (NAVMACS/SATCOM). To construct TAE<br />

troubleshooting episodes, we focused on the fault diagnosis/problem solving behaviors (Table 3) that military<br />

schools have identified in their six step troubleshooting process (Conner 1986,1987).<br />

\<br />

TABLE 3. Six Step Troubleshooting Process<br />

1. Symptom Recognition<br />

2. Symptom Elaboration<br />

3. Probable Faulty Functions<br />

4. Localizing Faulty Function(s)<br />

5. Isolating Faulty Circuit<br />

6. Failure Analysis.<br />

Although the design and delivery of the troubleshooting episodes did not require a computer, the amount<br />

of data made it obvious that the only efficient and cost effective approach would be utilization of microcomputer<br />

delivery and data gathering. Also, to keep developmental and hardware costs down, we limited ourselves to<br />

using off-the-shelf technology. We also reduced the “troubleshooting universe” of the episodes so that a<br />

standard microcomputer memory could handle data.<br />

The model developed for the troubleshooting actlviiy on a iven piece of hardware (shown in Fi ure 2)<br />

9<br />

provides a TAE Factors Model for “System Troubleshooting.” he modd works as follows: Once a8ystem is determined to be inoperative, the fault symptoms reduce the universe of type and location of tests to be<br />

made to a reasonable spectrum for further Investigation; that is the symptoms bound the problem and<br />

establishes what is in or out of bounds. This bounding of the problem reduces the number of tests in the<br />

spectrum to reasonable number and limits the amount of computer memory necessary. We called the “in<br />

bounds” checks that are not logical for the fault symptoms “illogrcal approach.” For a given set of symptoms<br />

for a given fault, there is an optimum troubleshooting path to determine the problem. To prove a component,<br />

or unit, is bad a number of tests must be performed; this requires testing of the “proof points.”<br />

. .<br />

IroubleshootlW<br />

-!+DrK~<br />

Jrcub @shoctlna<br />

f Faull 0 llbgi~l Approach<br />

0 Optimum Path 0 Out Of Bounds<br />

0 Proof Points 0 In Bounds<br />

Figure 2. TAE Factors Model<br />

373


The goal in the TAE testing is to find and replace the LRU. Subjects begin TAE testing by reviewing a series<br />

of menus of symptoms, Panels, and diagnostic information; next they select equipment to be tested and<br />

conduct tests or replace a LRU.<br />

ch Hy@heseg, The 20 hypotheses for the TAE Test and Evaluation were organized into seven<br />

categories: experience, electronics knowledge, electronics performance proficiency, difficulty level, time,<br />

complex test equipment, and ranking. The hypotheses in each category, and method of testing each, are<br />

described in the following sections.<br />

METHOD<br />

Test Administration Procedu & <strong>Testing</strong> was conducted by NPRDC personnel in a classroom at the<br />

Advanced Electronics School: Department (AESD), Service Schools Command, San Diego, California.<br />

<strong>Testing</strong> was on the Zenith 248 microcomputer. Technical documentation for the hardware system was.in the<br />

classroom. Subjects were assigned randomized test sequences to protect from test order effects. Srxteen<br />

episodes were administered to each subject and each episode required about an hour to comPlete, but<br />

subjects had no specific time limit. Subjects completed all episodes in two to three days. The admrnrstrator<br />

was present in the classroom during testing. Subjects listened to an introduction to the TAE study and the<br />

technical documentation available; read and signed a Privacy Act release statement; and completed a<br />

computerized Learn Program, 2 practice and 14 test troubleshooting episodes. After testing, subjects<br />

received test performance feedback and completed a critique.<br />

Subjects. Subjects for the TAE test and evaluation were students in the “system” phase of the maintenance<br />

course and the system qualified instructors. All subjects were required to have school training on the<br />

subsystems.<br />

Data Data were collected for 53 students and 13 instructors in two data bases, using a standard<br />

statistical Package for analysis. The first contained demographic data: the second, performance data. Data<br />

were collected for seven classes of students between April and September 1989. Demographic data for each<br />

student included: SSN, time in service, Armed Services Vocational Aptitude Battery (ASVAB) scores, school<br />

subsystem scores, school corn rehensive score, school final score, class ranking, TAE ranking, and instructor<br />

ranking. Demographic and TAI! performance data for instructors were collected during September 1989. The<br />

demographic data for instructors included SSN, rate/rating, time in service & paygrade, time system qualified,<br />

time working on the system in the fleet and as a system instructor. The TAE program data for both students<br />

and instructors consisted of scores for 16 episodes encompassing 673 variables. Table A-l describes the<br />

variables for each episode (Episode 1 is presented).<br />

Data files were refined and evaluated. Data for five students were dropped due to missing data, and for<br />

two instructors due to lack of system qualification. Thus, the data of 59 subjects were used for this study, 48<br />

students and 11 instructors. The resultant data base were used to create files for testing the study hypotheses.<br />

The master file was used to create files with variables specifically required to test each hypothesis. The<br />

methods for testing the hypotheses are described in the following subsections.<br />

RESULTS and DISCUSSION<br />

Results of the data analyses are presented in Appendix A, and the specific areas investigated are discussed<br />

in the following:<br />

Demoaraohic Data. For the 48 students, the average time in service was 2.23 years. For the 11 instructors,<br />

9 had a rate of electronics technician first class (ETl) and 2, of ET2; the average paygrade was 5.82. The<br />

average time in service for instructors was 10.41 years and average time in paygrade was 3.64 years.<br />

Instructors were system qualified for an average of 4.67 years and had worked on the system hardware in the<br />

fleet an average of 2.94 years, In addition, they averaged 16.18 months as instructors.<br />

.<br />

EI@WIWZ (Table A-2). Hypothesis 1. Instructors (experts) will score significantly higher on the TAE test<br />

than students (novices). A one-way analysis of variance (ANOVA) was performed to test hypothesis 1. The<br />

F ratio value is not significant.<br />

Hypothesis 2. Sub’ects with a longer time in the electronics rate (i.e., Time in Service - TIS) will score<br />

significantly higher on tlle TAE test than subjects with less time in that rate.<br />

Generally, the relationship between experrence and TAE performance was not statistically significant. This<br />

apparent anomaly may be explained by the fact that instructors of the course are not required to be system<br />

qualified. Students must prove their system qualification to graduate.<br />

The lack of a significant relationship between experience and troubleshooting performance causes one to<br />

uestion if the experience measures were appropriate, if an appropriate set of subjects was tested, if the TAE<br />

1 elivery and evaluation systems are valid, or if there is actually no difference due experience. Given the face<br />

validity of TAE and the high level of expectation by subject matter experts of the relationship between<br />

experience and Performance, further testing is needed to resolve this issue.


.<br />

EJectronlcs KnowledQg (Table A-3). Hypothesis 3. Students with higher academic school final scores<br />

will score hi her on the TAE test than students with lower scores. The correlation between academic school<br />

final scores aover<br />

course final score) and TAE test scores is significant at the .05 level. However, the correlation<br />

between academic school comprehensive scores (final test) and TAE test scores is positiie but not<br />

significant. Therefore, academic school final scores were significantly correlated with TAE test scores, but<br />

school comprehensive scores were not.<br />

Hypothesis 4. Students wfth higher academic school subsystem test scores will score higher on the TAE<br />

subsystem tests (episodes) than students with lower school subsystem test scores. For Subsystem 1, the<br />

correlation of academic school subsystem test scores with TAE subsystem test scores is significant at the .05<br />

level. Subsystem 2 has a positive correlation, which is not significant. Both Subsystems 3 and 4 have negative<br />

correlations, which are not significant. Therefore, the only significant correlation between academic school<br />

subsystem test scores and TAE subsystem test scores was for Subsystem 1 (the computer).<br />

Hypothesis 5. Students with higher appropriate Armed Services Vocational Aptitude Battery (ASVAB)<br />

scores for Electronics Technician selection in general science, electronics information, mathematics<br />

knowledge, arithmeticreasonin (jGS + El + MK] +AR), and the armed forces qualification test (AFQT), will<br />

score higher on the TAE test ta an subjects with lower ASVAB and selection scores. All but one of the<br />

correlations is negative. The only significant correlation between ASVAB scores and TAE score is Arithmetic<br />

Reasoning (AR) with a negative correlation significant at the .05 level. The only positive correlation is between<br />

General Science (GS) and TAE score, which was not significant.<br />

There was no generally consistent relationship between electronics knowledge and TAE performance.<br />

There was a relationship where performance testing was a component of the academic score used. There<br />

was, however, a negative relationship between the scores used to determine selection to the occupational<br />

speciality (electronic technician) and performance scores.<br />

The lack of relationships of electronic theory or academics and troubleshooting performance need further<br />

investigation. As with a number of other studies of this type, there was no consistent relationship between<br />

knowledge of theory and the ability to perform. This may have been related to the method of determining<br />

knowledge and academic success in the school. <strong>Testing</strong> in the school does not appear to rovide<br />

discriminatory capability and correlational analyses do not show statistically significant results. tchools<br />

should ensure tests discriminate between student’s academic and performance ability and assess student<br />

behaviors in a more structured, formalized, objective way. Otherwise, effects of a change to instructional<br />

methods or techniques cannot be assessed in terms of course outcomes. FurtherTAE testing might determine<br />

the resulting relationships.<br />

Also, the relationships of selection requirements and troubleshooting performance need further<br />

investigation. Of greatest interest is the failure of performance results to positively relate to the ASVAB scores<br />

used to select personnel for this occupational speciality. The consistent negative trend seems to indicate<br />

that, while the ASVAB tests may relate to academic performance, there may be no relationship between<br />

ASVABs, TAE performance, and/or on-the-job performance.<br />

.<br />

etfwce Profa<br />

. .<br />

7 Fable A-4). Hypothesis 6. Subjects with a higher level of<br />

troubleshooting proficiency will make ewer invalid checks than less proficient subjects. The correlation<br />

between TAE score and the number of invalid checks is not significant.<br />

Hypothesis 7. Subjects with a higher level of troubleshooting proficiency will make fewer illogical<br />

approaches than less proficient subjects. The correlation between TAE score and the number of illogical<br />

approaches is significant at the .Ol level.<br />

Hypothesis 8. Subjects with a higher level of troubleshooting proficiency will make fewer incorrect<br />

solutions than less proficient subjects. The correlation between the TAE score and the number of incorrect<br />

solutions is significant at the ,001 level.<br />

Hypothesis 9. Sub’ects with a higher level of troubleshooting proficiency will make fewer redundant checks<br />

than less proficient sub jects. The correlation between TAE score and the number of redundant checks is not<br />

significant.<br />

Hypothesis 10. Subjects with a hi her level of troubleshooting roficiency will test significantly more proof<br />

ooints than less oroficient subjects. 7 he correlation between the f-AE score and the number of proof points<br />

js significant at the .OOl level. .<br />

Hypothesis 11. Subjects with a higher level of troubleshooting proficiency will make significantly fewer<br />

tests than less proficient subjects. The correlation between the level of troubleshooting proficiency and<br />

number of tests is significant at the .OOl level.<br />

The only proficiency factors that failed to show significance were invalid and redundant checks, which<br />

could have been caused by design of the delivery system and/or the method of determining these factors.<br />

This set of hypotheses strongly support the validity of the TAE technique and approach.<br />

The utility of the TAE as a job performance measure and as an objective measure of readiness in the skill<br />

area addressed (in this case, system troubleshooting) should be investigated further.<br />

. .<br />

DIfflcultv (Table A-45 . Hypothesis 12. The more difficult the episodes, the longer the average time<br />

needed to find the solution. ihe<br />

correlation of TAE difficulty with length of time to find the solution is significant<br />

at the .OOl level.<br />

Hypothesis 13. On episcdes of equal difficulty, subjects with a higher level of troubleshooting proficiency<br />

will take significantly less time than less proficient subjects in finding the solution. Episode difficult levels<br />

were determined and episodes were grouped with level 1 being the easiest and level 5 the most di fricult as<br />

375<br />

_ .


follows: (I 2 episodes (2) 4 episodes (3) 3 episodes (4) 2 episodes and (5) 3 episodes. Hypothesis 13 was<br />

significant 1y<br />

supported for each level.<br />

Hypothesis 14. The more difficult the episode, the less time the instructors will take to find the TAE test<br />

solutions when compared to the students (novices). The difficulty level of the episode and the dtfference rn<br />

time between instructors and students to find TAE test solutions is negatively correlated but not significant.<br />

Although no significant difference was found, the more difficult the episode, the less time instructors tended<br />

to take to find the TAE test solutions when compared to the students.<br />

Generally, the results were as expected; that is, the more difficult, the more time; at different levels of<br />

difficulty, better performers took less time. An unexpected result was the lack of significant difference between<br />

students and instructors. The difference was, however, strongly in the direction expected.<br />

The consistently significant relationship In this area clearly calls for further investigation and improvement,<br />

particularly in behavioral and cognitive task analyses.<br />

Iime (Table A6). Hypothesis 15. Subjects with a higher level of troubleshooting proficiency will take<br />

si nificantly less total time to find TAE e isode solutions than less proficient subjects. The correlation b&ween<br />

TfE score and total time to find epis OCPe fault is significant at the .OOl level.<br />

Hypothesis 16. Subjects with higher levels of troubleshooting proficiency will take a significantly longer<br />

time than less proficient subjects before making the first test point. The correlation between TAE score and<br />

time to first test point is significant at the .05 level.<br />

Results suggest that analysis of behavior and cognitive protocols could result in a dramatic change In the<br />

way the training community presents troubleshooting training. Here again, behavioral protocol analysis could<br />

provide useful information on training approaches.<br />

Comolex Test Eauioment (Table A-7). Hypothesis 17. Subjects with a higher level of troubleshooting<br />

proficiency will make significantly more tests using an oscilloscope than less proficient subjects. The<br />

correlation between TAE score and the number of oscilloscope tests is not significant.<br />

Given the nature of the hardware system and the resulting TAE delivery system, subjects did not a pear<br />

to have sufficient opportunity to use complex test equipment in the TAE episodes. Therefore, the lac R of a<br />

statistically significant result may have no practical meaning.<br />

m (Table A-8). H pothesis 18. The higher the student’s TAE class rank, the higher the student will<br />

be ranked in terms of trou‘6 leshooting proficiency by instructors or work center supervisors. Hypothesis 18<br />

was supported for two classes at the .OOl level. The correlation between TAE class ranking and in.structorMork<br />

center supervisor ranking was not significant for the other classes. Although not significant, two classes had<br />

an inverse relationshi .<br />

Hypothesis 19. Ph e higher the student’s TAE class rank (final score), the higher will be the student’s<br />

ranking in the class. Hypothesis 19 was supported for one class at the .Ol level of significance. For the other<br />

classes, the correlation between TAE class ranking and ranking in school class was not significant. Although<br />

not significant, two classes indicated a strong positive correlation. Conversely, one class showed a strong<br />

inverse relationship between TAE class ranking and school class ranking.<br />

Hypothesis 20. The higher the instructor ranking of the student in terms of troubleshooting proficiency,<br />

the higher will be the student’s ranking in the class (final score). Hypothesis 20 was supported for three<br />

classes, one class at .OOl level and two at .05 level. Although not significant, one class showed a strong<br />

positive correlation between instructor student ranking and class student ranking. One class showed a weaker<br />

positive correlation and two classes indicated an inverse relationship.<br />

There were no consistent results in rankings across instructors, TAE performance, or school Performance.<br />

In several classes, inverse relationships were shown. Only one class had a consistent significant relationship<br />

across hypotheses.<br />

The results of this area most clearfy attest to the need for an objective evaluation tool of the skill of<br />

troubleshooting. It shows that supervisors, and school results do not have the ability to evaluate personnel<br />

in this skill.<br />

FUTURE EFFORTS<br />

In addition to the recommendations made for each area of investigation, we also have the following general<br />

recommendations for future efforts in this area.<br />

1. Further investigate TAE validity and reliability. Design and development of the TAE approach and<br />

delivery system stron ly support face validity of TAE. Subject matter ex erts were involved in all phases of<br />

the project. They Betermined the factors of evaluation, weights oP the factors, evaluation scheme,<br />

troubleshooting episodes to be used, developed the episodes and participated in the test and evaluation.<br />

Since T&E results are somewhat ambiguous, areas dealing with validity and reliability should be investigated<br />

further.<br />

2. Analyze data to further develop discriminatory/predictive capability. Results of performance Of<br />

subjects on TAE episodes should be subjected to behavioral protocol analyses to develop a model of<br />

troubleshooting and further analyses of approaches used by good vs. bad troubleshooters and ultimately<br />

cognitive Protocol analyses to determine selection, training and evaluation requirements.<br />

376


3. Further test the TAE approach on a larger and more comprehensive popL!la:ion and on other<br />

equipment. Further investigation should use hardware that allows wider and less restnctfve utilization of test<br />

equipment. It may also be possible to select specific troubleshooting episodes that enable wider utilization<br />

of more types test equipment. This type of investigation should take place to determine if certain episodes<br />

and hardware types require special test equipment use capability. Investigate this approach in other high-tech<br />

hardware systems as well as other occupational areas (i.e., mechanical hardware troubleshooters/repair<br />

personnel). A TAE type delivery system should be developed for a number of other high and mid-tech<br />

hardware systems. -- - -<br />

4. Develop more troubleshooting episodes to provide directive training, guided training, and tests with<br />

feedback. Then, a complete and comprehensive troubleshooting skill development, maintenance,<br />

assessment, and evaluation program would be available for personnel from novIce to expert skill levels. TAE<br />

could be used for active duty personnel in a school or fleet environment and for reserve personnel at the<br />

readiness centers or aboard ship during active duty periods.<br />

For greater detail on the background, design/development and administration and the test and evaluatlon<br />

results consult: Conner and Hassebrock (in press); Conner, Hartley and Mark, (in press); and Conner,.Poirier,<br />

Ulrich and Bridges, (in press).<br />

REFERENCES<br />

Conner, H. B. (1988, October). o oub Tr Proficiencv I‘ eshoot in F v a l u a t i o n Proiect (TPE P). In Proceedings of<br />

<strong>Military</strong> <strong>Testing</strong> <strong>Association</strong> Conference, Mystic, Connecticut.<br />

ner, & Bjl(1987, Apri<br />

ace ss e System f<br />

of the National Security In<br />

on Pro&t TTPFP) form<br />

ationai Manpower and Training Conference<br />

Conner, H. B. & Hassebrock, 7 7 I(rn press). -Assessmentand Fnhancement (TAF)<br />

Proaram: Theoretical. Method0 oa ca , Test and Evaluation Issues. San Diego: Navy Personnel Research<br />

and Development Center.<br />

~)rr~mHTB.; H;rtley, S.,.Mark, L J: (In press). rent FAF)<br />

oo a : es a d Evaluatron. San Diego: Navy Personnel Research and Development Center.<br />

Conner, H. B., Poirier! C., Ullrich, R., & Bridges, T. (In press).. [<br />

Des~gp, Devmnd Proom San Diego: Navy Personnel Research<br />

nt Center. ’<br />

. . . .<br />

Nauta, F. (1984). Afl&z&g Fleet Maintenance-Q! Maintenance-<br />

Research. (VI NAVIRAEQUIPCEN MDA903-81-0188-l). Orlando: Naval Training Equipment Center.<br />

377<br />

- .


EKie<br />

;;<br />

;:<br />

;:<br />

v7<br />

V8<br />

VQ<br />

VlO<br />

Vll<br />

v12<br />

v13<br />

v14<br />

v15<br />

V16<br />

v17<br />

V18<br />

v19<br />

V2CJ<br />

v21<br />

Contents of Variable<br />

Subject’s Social Security Number<br />

Equipment (hardware subsystem) number<br />

(1 = USH26)<br />

Episode number (1)<br />

Found Solution (1 = Yes, 0 = No)<br />

Number of Test Points<br />

Number of Out-of-Bounds tests<br />

Number of Valid Checks<br />

Number of Invalid Checks<br />

Number of Redundant Checks<br />

Number of Proof Points subject tested<br />

Total number of Proof Points in the episode<br />

% proof pts tested: o/l0 % Vl l)*lOO, rounded<br />

to a whole number<br />

Total Time spent on the episode (in minutes)<br />

To Be Determined<br />

Number of Equipment Selection events<br />

Number of Front Panel events<br />

Number of Maintenance Panel events<br />

Number of Fallback test events<br />

Number of Reference Designator test events<br />

Number of Replace LRU events<br />

Number of Review Symptoms events<br />

APPENDIX A<br />

TAE DATA and ANALYSIS RESULTS<br />

TABLE A-l. Variables for TAE Episode 1<br />

TABLE A-2. Experience<br />

Variable<br />

Name<br />

Contents of Variable<br />

To be determined<br />

E Number of Diagnostic Test events<br />

V24 Number of Load Operational Program events<br />

v25 Number of Step Procedure events<br />

V26 Number of Revision events (instructor intervention)<br />

V27 Number of INCORRECT Replace LRU events<br />

V28 Number of GOOD FAULT Replace LRU events<br />

Time to first Reference Designator Test (in minutes)<br />

E Time to first Diagnostic Test (in minutes)<br />

v31 Sum of all steps of episode: ALL events, except Inst.<br />

actions.<br />

V32 Number of Waveform tests performed<br />

V33 Number of Voltage tests performed<br />

V34 Number of Read Meter tests performed<br />

V35 Number of Logic tests performed<br />

V36 Number of Current tests performed<br />

v37 Number of Frequency tests performed<br />

V38 Number of Continuity tests performed<br />

v39 Number of Adjustment tests performed<br />

V40 Final Score of the episode<br />

v41 To be determined - for possible future expansion<br />

V42 To be determined -for possible future expansion<br />

v43 To be determined -for possible future expansion<br />

HI TAE Student TAE Test Score & instructor TAE Test Score<br />

Group Mean N Variable l:TAESCORE<br />

1 70.396 48 Source Sum of Sqs D.F. Mean Sq FRatio Prob.<br />

G&d 70.980 73.422 59 11 Within Between 2057.124 81.073 57 1 81.973 36.090 2.271 .1373<br />

Mean Total 2139.098 58<br />

Correlational Hypothesis Statement N Correlation Critical Value<br />

H3 TAE Score vs TIS 59 .13676 .21638<br />

Correlational Hypothesis Statement<br />

H3 TAE vs School Final<br />

TAE vs School Camp<br />

TABLE A-3. Electronic Knowledge<br />

H4 Avg. TAE Subsystem 48<br />

1 vs. School Subsys 1 .27704’<br />

2vs.” “2 .17579<br />

3v.s” “3 - .18146<br />

4vs.” “4 - .21972<br />

H5 TAE vs ASVABS 48<br />

AFQT - .00398<br />

AR - .32510’<br />

El - 96673<br />

ASVABl - .02672<br />

ASVABT - .13055<br />

ii<br />

N<br />

Correlation Critical Value<br />

.30181* .24045<br />

.17311 .244X5<br />

TABLE A-4. Electronic Performance Proficiency.<br />

.24045<br />

.24045<br />

.24045<br />

a24045<br />

.24045<br />

.24045<br />

.24045<br />

.24045<br />

.24045<br />

Correlational Hypothesis Statement N<br />

Correlation Critical Value -<br />

H6 TAE vs Invalid Checks 59<br />

- .17107 .21638<br />

H7 TAE vs Illogical Approaches<br />

59 - 34057” .21638<br />

H8 TAE vs Incorrect Solutions<br />

59 - .69676”* .21638<br />

H9 TAE vs Redundant Checks<br />

59 - 98543 .21638<br />

HlO TAE vs Proof Points<br />

59 .56997*** .21638<br />

Hl 1 TAE vs X of Tests<br />

59 - .55201*** .21638 - - -<br />

378<br />

-<br />

_<br />

,


* PC.05<br />

* * pc.01.<br />

*** pc.001.<br />

Correlational Hypothesis Statement<br />

H13 Ep Diff vs. Ep Time<br />

H14 Ep Diff Lev vs Time<br />

b/81 1 (Easiest)<br />

Level 2<br />

Level 3<br />

Level 4<br />

Level 5 (Hardest)<br />

H15 Ep Diff vs. Time Dif<br />

Correlational Hypothesis Statement<br />

HlSTAEvsTime<br />

H18 TAE vs Time to 1st Check<br />

TABLE A-5. Difficulty Level<br />

N<br />

ii<br />

14<br />

TABLE A-6. Time<br />

Correlation critical Value<br />

.93D51 � ** .45Qtxl<br />

- .81265*** .21638<br />

- 3x04** .21638<br />

- *74653*** .21638<br />

- .73553-e .21638<br />

- 587Q8*** .21638<br />

- 34658 .459rxJ<br />

Correlation Critical Value<br />

- .49233*‘* .21638<br />

59 - .23814* -21638<br />

TABLE A-7. Complex Test Equipment<br />

Correlational Hypothesis Statement N Correlation Critical Value<br />

H17 TAE vs 0SCOp8 use 59 .18T71 .21638<br />

Correlational Hypothesis Statement<br />

H71;z yking vs lnst Ranking<br />

2<br />

3<br />

4<br />

5<br />

6<br />

7<br />

H21 TAE Rank vs Class Rank<br />

Class 1<br />

2<br />

3<br />

4<br />

f<br />

7<br />

yl;iss yk vs lnst Ranking<br />

f<br />

4<br />

f<br />

7<br />

TABLE A-a. Ranking<br />

379<br />

N<br />

7<br />

7<br />

,B<br />

8<br />

8<br />

7<br />

7<br />

7<br />

i<br />

8 6<br />

7<br />

7<br />

7 8<br />

ii<br />

9<br />

7<br />

Correlation<br />

.96429***<br />

35714<br />

A6429<br />

.4S571<br />

- .14286<br />

- .07143<br />

.96429*‘*<br />

.8Q286**<br />

.57143<br />

- .14288<br />

46571<br />

- .37143 sQ524<br />

.60714<br />

.96429***<br />

- 35714 .02381<br />

.75ax*<br />

:Z*<br />

64286<br />

Critical Value<br />

.87649<br />

67649<br />

.87649<br />

.73972<br />

.73972<br />

.82658<br />

47849<br />

m649<br />

67549<br />

.67649<br />

.73Q72<br />

.82658 .73972<br />

.87649<br />

.67649<br />

67649 .82658<br />

67649<br />

a2558<br />

.686Q7<br />

.67649<br />

. .


Incrementing ASVAB Validity with<br />

Spatial and Perceptual-Psychomotor Tests<br />

Henry H. Busciglio<br />

U. S. Army Research Institute<br />

The Army's Project A is a long-term, comprehensive effort to<br />

improve the selection and classification of enlisted personnel.<br />

One objective of this effort was to develop and validate measures<br />

of abilities other than the general cognitive domain covered by<br />

the Armed Services Vocational Aptitude Battery (ASVAB), including<br />

spatial, perceptual, and psychomotor abilities. Previous<br />

analyses of Project A data (Campbell, 1988) showed that the ASVAH .'<br />

is useful for predicting first tour performance. Therefore, the<br />

ASVAB serves as a baseline against which the marginal utility of<br />

other tests for selection and classification is judged. This<br />

analysis of data collected during the 1985 Project A Concurrent<br />

Validation attempted to answer three questions:<br />

(1) How much of the variance in comprehensive performance<br />

measures can spatial and perceptual-psychomotor tests account<br />

for, over and above that predicted by ASVAB subtests?<br />

(2) Is either type of test, spatial or perceptual-psychomotor,<br />

more useful for incrementing ASVAB validity?<br />

(3) Which specific Project A tests will make the highest<br />

individual contributions to this incremental validity?<br />

Subjects<br />

Method<br />

Subjects were first-term enlisted personnel in the nine MOS<br />

for which hands-on criterion measures were collected as part of<br />

the 1985 Concurrent Validation phase of Project A. The number of<br />

subjects from each MOS, as well as the total sample size, is<br />

shown in Table 1.<br />

Predictors<br />

Predictors were the nine ASVAB subtests, the six Project A<br />

paper-and-pencil tests of spatial ability, and 14 selected scores<br />

from the ten Project A computerized perceptual-psychomotor tests.<br />

Table 2 presents a list of these predictors, along with the<br />

specific perceptual-psychomotor scores used.<br />

Presented at the meeting of the <strong>Military</strong> <strong>Testing</strong><br />

<strong>Association</strong>, November, 1990. All statements expressed in this<br />

paper are those of the author and do not necessarily reflect the<br />

Official opinions or policies of the U.S. Army Research Institute<br />

or the Department of the Army.<br />

380


Table 1<br />

Subjects<br />

MOS Enlisted Job N<br />

11B<br />

13B<br />

19E<br />

31c<br />

63B<br />

64C (now 88M)<br />

71L<br />

91A<br />

95B<br />

TOTAL<br />

Infantry<br />

Cannon Crew<br />

Armor Crew<br />

Single Channel Radio Operator<br />

Light Wheel Vehicle Mechanic<br />

Motor Transport Operator<br />

Administrative Specialist<br />

Medical Specialist<br />

<strong>Military</strong> Police<br />

491<br />

464<br />

394<br />

289<br />

478<br />

507<br />

427<br />

392<br />

597<br />

4,039<br />

Note. Actual sample sizes for some analyses were<br />

smaller than those shown.<br />

Table 2<br />

Predictor Measures<br />

ASVAB Subtests:<br />

Spatial Ability Tests:<br />

Mechanical Comprehension Assembling Objects<br />

Auto/Shop Information Map<br />

Electronics Information Maze<br />

Math Knowledge<br />

Object Rotation<br />

Arithmetic Reasoning Orientation<br />

Verbal (Paragraph Comprehension Figural Reasoning<br />

+ Word Knowledge)<br />

General Science<br />

Coding Speed<br />

Number Operations<br />

Perceptual-Psychomotor Tests and Scores:<br />

Target Tracking 1 - accuracy<br />

Target Tracking 2 - accuracy<br />

Target Shoot - accuracy and time-to-fire<br />

Cannon Shoot - time discrepancy (from optimal)<br />

Simple Reaction Time - decision time<br />

Choice Reaction Time - decision time<br />

Short-Term Memory - decision time and proportion correct<br />

Perceptual Speed - decision time and proportion correct<br />

and Accuracy<br />

Target Identification - decision time and proportion .correct<br />

Number Memory - response time<br />

381


criterion Measures<br />

All criteria were comprehensive, llcan-dolt measures of job<br />

performance, as listed and described below.<br />

Total Score on Written Tests: measures of soldiers'<br />

technical knowledge pertinent to the various "critical tasks"<br />

performed in each MOS.<br />

Total Score on Hands-On Tests: measures of soldiers' ability<br />

to actually carry out the 14 to 17 major job tasks in each MOS.<br />

General Soldierina Proficiencv: a composite score on written<br />

and hands-on tests of tasks common to many MOS (e.g., determining<br />

grid coordinates on maps, recognizing friendly/threat aircraft).<br />

Core (i.e., MOS-specific) Technical Proficiency: a composite<br />

score on written and hands-on tests of tasks that are at the<br />

llcorell of each MOS (i.e., those that define the MOS).<br />

Skill Qualification Test Score (SOT): written tests Of MOSspecific<br />

technical knowledge developed by the U.S. Army Training<br />

and Doctrine Command for periodic testing of soldiers MOS.<br />

The comprehensive measures above are not mutually exclusive.<br />

Written and hands-on test scores were used in the computation of<br />

General Soldiering and Core Technical Proficiency, as well as the<br />

total scores for written and hands-on tests. ,<br />

Procedure<br />

Collection of Project A predictor and criterion data was<br />

part of the 1985 concurrent validation. Scores on the ASVAB<br />

subtests and the Skill Qualification Test were obtained from<br />

archival data sources.<br />

A series of backward stepwise multiple regression analyses<br />

were performed separately for each MOS. An SPSS Regression<br />

program sequentially entered blocks of ASVAB, spatial, and<br />

perceptual-psychomotor tests, removing nonsignificant tests in<br />

each block before entering the next block. Two orders of entry<br />

were used. In both cases the ASVAB tests were entered first; in<br />

one analysis spatial tests were entered second, followed by the<br />

perceptual-psychomotor; in the other analysis this order was<br />

reversed. Results were corrected for restriction-of-range in the<br />

ASVAB scores (Lawley formula; Lord and Novick, 1968), and<br />

adjusted for shrinkage (Wherry, 1940).<br />

Results<br />

Table 3 shows the proportion of variance explained (R2) by<br />

the significant predictors of the criteria at each stage of<br />

382


Table 3<br />

Proportion of Criterion Variance (R2) Accounted for by<br />

Significant Predictors (Median Values Across MOS)<br />

Stage (1) (W Pa) (2b) (3b)<br />

Predictors Retained ASV ASV+Sp ASV ASV+P/M<br />

Predictors Entered ASV SP P/M P/M SP<br />

Written Tests: .59 .64 65 .61 .65<br />

Hands-On Tests: .29 .33 133 .31 33<br />

General Soldiering: .47 .51 53 .50 :53 .<br />

Core Technical: .44 .49 :50 .48 51<br />

Skill Qualification: .53 .54 .55 -54 :55<br />

analysis. A comparison of column 1 with columns 3a and 3b<br />

indicates that spatial and perceptual-psychomotor test scores<br />

substantially improved the prediction of Written Test scores,<br />

General Soldiering Proficiency, and Core Technical Proficiency.<br />

Increases in R*s for Hands-On Tests and the Skill Qualification<br />

Test Score were more modest.<br />

Regarding the relative usefulness of spatial vs. perceptualpsychomotor<br />

tests for incrementing the prediction of the<br />

criteria, columns 2a and 2b of Table 3 show median incremental<br />

R2s (across MOS) of spatial vs perceptual-psychomotor predictors<br />

at Stage 2. Spatial tests were slightly better than perceptualpsychomotor<br />

scores for improving the prediction of the criteria.<br />

The third research question concerns the validities and<br />

incremental validities of individual Project A tests. Table 4<br />

lists the three best spatial, perceptual, and psychomotor tests,<br />

in terms of frequency and magnitude of significant effects. For<br />

the tests of spatial ability, Assembling Objects, Figural<br />

Reasoning, and Map were superior incremental predictors. Among<br />

the perceptual scores, Target Identification (% correct), Short<br />

Term Memory (% correct), and Number Memory (response time) were<br />

especially useful as incremental predictors. For the psychomotor<br />

scores, l- and a-Hand Tracking (accuracy), and Target Shoot<br />

(time-to-fire) were the best.<br />

Discussion<br />

In these analyses Project A test scores substantially<br />

improved the prediction of the criteria. The results for Total<br />

Score .on Written Tests and General Soldiering Proficiency support<br />

the wide generalizability of Project A incremental validity.<br />

Specifically, the first measure may involve highly different<br />

content across MOS, while the second measures a set of more<br />

383


Table 4<br />

Best Spatial, Perceptual, and Psychomotor Tests<br />

Number of Equations Range of Median<br />

Project A Where Significant Semi-partial<br />

Tests (Maximum=86) Correlations<br />

Spatial:<br />

Assembling Objects 48 . 06 - .11<br />

Figural Reasoning 40 . 07 - .12<br />

Map 33 .07 -, .14<br />

Perceptual:<br />

Target Id. - % correct 32<br />

Short Term Memory - % correct 25<br />

Number Memory - response time 25<br />

. 07 - .lO<br />

.07 - .14<br />

ns - -07<br />

Psychomotor:<br />

a-Hand Tracking - accuracy 20 05 - .15<br />

l-Hand Tracking - accuracy 18 -:OS - .lO<br />

Target Shoot - time-to-fire 6 -.09 - -.07<br />

common tasks, but does so using both written and hands-on scores.<br />

Although spatial tests were slightly superior to the<br />

perceptual-psychomotor scores as incremental predictors, the<br />

latter group of measures accounted for criterion variance which<br />

is not redundant with the spatial tests. This is important<br />

because the perceptual-psychomotor tests require expensive<br />

computer hardware and software and must be administered<br />

individually. Thus, their utility should be considered<br />

separately with each selection or classification decision.<br />

These analyses also revealed that some individual Project A<br />

tests were significant incremental predictors across a wide<br />

variety of MOS and criteria (see Table 4). These measures are<br />

therefore strong candidates for addition to ASVAB.<br />

To interpret these results properly, a number of<br />

methodological considerations should be noted. First of all,<br />

ASVAB scores were employed for selection, while the Project A<br />

scores were used Infor research purposes only." Individuals may<br />

have responded more carefully, exerted more effort, etc., on the<br />

ASVAB subtests, thus making them more valid measures of abilities<br />

than the Project A tests. Another concern is a statistical one.<br />

Although the samples used were large enough to make the degree of<br />

shrinkage in each individual equation relatively low, the large<br />

number Of equations computed increases the probabilities that<br />

384<br />

. .


some ASVAB and Project A predictors were significant due to Type<br />

I errors. Although most of the Project A tests were significant<br />

far more often than the chance level (cf. the middle column of<br />

Table 4), the lack of opportunities at this point for crossvalidation<br />

renders the results reported in this paper exploratory<br />

and suggestive only.<br />

The Longitudinal Validation of Project A, which began in<br />

1986/87, will provide more definitive answers to the research<br />

questions involved in these analyses. Based upon the preliminary<br />

results reported here, we are optimistic about the findings of<br />

the Longitudinal Validation.<br />

References<br />

Campbell, C.H. (in preparation). Developing basic criterion<br />

scores for hands-on tests, iob knowledae tests, and task<br />

ratina scales (Draft of AR1 Technical Report).<br />

Campbell, J.P. (Ed.). (1988). Imnrovins the selection,<br />

classification, and utilization of Armv enlisted personnel:<br />

Annual report, 1986 fiscal year (AR1 Technical Report 792).<br />

Alexandria, VA: U.S. Army Research Institute.<br />

Cohen, J., & Cohen, P. (1983). Anolied multinle rearession/<br />

correlation analysis for the behavioral sciences. Hillsdale,<br />

NJ: Lawrence Erlbaum Associates.<br />

Davis, R.H., Davis, G.A., Joyner, J.N., & de Vera, M.V. (1987).<br />

Development and field test of iob relevant knowledge tests<br />

for selected MOS (AR1 Technical Report 757). Alexandria, VA:<br />

U.S. Army Research Institute.<br />

Lord, P., & Novick, M. (1968). Statistical theorv of mental<br />

test scores. Reading, MA: Addison-Wesley Publishing Co.<br />

Pedhazur, E.J. (1982). Multinle regression in behavioral<br />

research (2nd. Ed.). New York, NY: Holt, Rinehart and<br />

Winston.<br />

Peterson, N.G. (Ed.). (1987). Development and field test of the<br />

trial battery for Proiect A (AR1 Technical Report 739).<br />

Alexandria, VA: U.S. Army Research Institute.<br />

Wherry, R.J. (1940). Appendix A. In W.H. Stead and C.P. Shartle<br />

(Eds. 1, Occunational counseling techniques. New York:<br />

American Book Company.<br />

385<br />

. .


Item Content Validity: Its Relationship<br />

With Item Discrimination and Difficulty<br />

Teresa M. Rushano<br />

USAF Occupational Measurement Squadron<br />

At the USAF Occupational Measurement Squadron (USAFOMS), subject-matter experts<br />

(SMEs) rate the questions on promotion tests for content validity.<br />

They also use standard statistical criteria to determine whether test questions<br />

should be reused on subsequent test revisions. The purpose of this<br />

research was to explore the relationship between SME content validity ratings<br />

(CVRs) and item statistics.<br />

The Specialty Knowledge Tests (SKTs) used for enlisted promotions in the Air<br />

Force are written at USAFOMS by senior NCOs acting as SMEs under the guidance .<br />

of USAFOMS psychologists. Within each specialty, one SKT is prepared for<br />

promotion to staff sergeant (E-51, and one for promotion to technical and<br />

master sergeant (E-6 and E-7).<br />

The USAFOMS test development process includes a procedure based on the methodology<br />

of Lawshe (1975) for quantifying content validity on the basis of<br />

essentiality to job performance. As part of the process of revising an existing<br />

SKT, each SME independently assigns each test question a rating using<br />

the following scale:<br />

Is the skill (or knowledge) measured by this test question:<br />

(21, Essential<br />

Useful but not essential (11, or<br />

N o t n e c e s s a r y (01,<br />

for successful performance on the job?<br />

The SMEs as a team then use these ratings as a point of departure in discussing<br />

whether individual items should be retained on subsequent test revisions.<br />

Perry, Williams, and Stanley (1990) found that CVRs influence SME determina-<br />

tion of an item's test-worthiness and its subsequent selection for continued<br />

use or deactivation. However, the ratings are not the only factors which may<br />

impact the SME decision whether to reuse an item on an SKT. After completing<br />

the CVRs, SMEs review item statistics.<br />

For each SKT question, item statistics are provided which indicate how well<br />

,-an item is doing on the test. USAFOMS has an established set of statistical<br />

Tcriteria for test items which must be met. Test questions that do not meet<br />

these criteria must be revised in order to be incorporated on the revised<br />

3 version of the test. The two statistical elements examined in this research<br />

~.'are the difficulty index and discrimination index. The difficulty (DIFF) of<br />

# a test item, sometimes known as its ease index, is defined as the total percentage<br />

of examinees on a test who selected each choice. The DIFF value for<br />

the correct answer is examined to see if the item as a whole is too easy or<br />

too hard. For example, an item answered correctly by 97% of the examinees is<br />

considered too easy for the purposes of the SKT and would not be reused on<br />

subsequent test revisions.<br />

The s,econd statistical element used in this research is the discrimination<br />

index (DISC). This statistic is calculated for each item choice by subtract-<br />

ing the percentage of low-scoring examinees (i.e., those scoring in the lower<br />

50% of all examinees) who select a choice, from the percentage of high-scoring<br />

examinees making that choice. If a test question is working properly,


the higher-scoring examinees will answer the question correctly, while the<br />

lower-scoring examinees will select incorrect options. When this occurs, the<br />

correct answer’s DISC will be positive and the incorrect answers will have<br />

negative DISC values.<br />

METHOD<br />

Content validity ratings and item statistics were obtained from both the E-5<br />

and E-617 SKTs of 23 Air Force specialties (AFSs). Table 1 lists the AFSs<br />

examined and their Air Force specialty codes (AFSCs). Using USAFOMS standard<br />

forms, SMEs rated the content validity of each item on the tests they were<br />

revising. The AFSs chosen for this study were those found by Perry et‘ al.<br />

(1990) to have significant (p


Table 1<br />

Air Force Specialties and Specialty Codes<br />

SPECIALTY AFSC<br />

Pararescue/Recovery 115x0<br />

Visual Information Production 231X3<br />

Airfield Management 271X1<br />

Air Trafic Control 272X0<br />

Elec. Comp. and Switching Systems 305x4<br />

Maint. Data Systems Analysis 391x0<br />

Missile Systems Maintenance 411XOA<br />

F-15 Avionics Test Station 451x4<br />

FB- 111 Avionics Test Station 451X6<br />

Photo. and Sen. Maint. Tac/Recon 455XOA<br />

Photo. and Sen. Maint. ReconEl.0. 455XOB -<br />

Air Launched Missile Sys. Maint. 466X0<br />

Zomm. Computer System 491x0<br />

Refrigeration and Air Conditioning 545x0<br />

Construction Equipment 551x1<br />

Production Control 555x0<br />

Logistics Plans 661X0<br />

[nformation Management 702X0<br />

tianpower Management 733x1<br />

Radiology 903x0<br />

Medical Laboratory 924x0<br />

systems Repair 991x4<br />

scientific Measurement 991x5<br />

. ._ -. , _ .<br />

388


Table 2<br />

Correlation Coefficients of Content Validity Ratings and Item Statistics<br />

15156A *.282 .148 90370 .021 *.230<br />

S156B *.265 .182 92450 *.336 *.263<br />

15176 .009 .148 92470 *.315 *.259<br />

15550A *.240 .177 99154 .064 .055<br />

15570A *.278 ,076 99174 .105 .050<br />

i5550B *.196 .116 99155 .017 .083<br />

i5570B .175 *.210 99175 .004 .130<br />

*Indicates significant correlation t.05)<br />

389


an item is for a certain level of test and the two SKTs are constructed inde-<br />

pendently. Typically, both the specialty training standard (STS) and the<br />

occupational survey report (OSR) which are used in the development of SKTs,<br />

show that different levels of knowledge are required for these ranks and that<br />

different types of tasks are associated with E-5 and E-6/7 positions.<br />

Finally, a fourth post hoc analysis was conducted to examine the test populations<br />

of the 48 SKTs studied. It was hypothesized that SKTs with higher test<br />

populations would more likely be the SKTs with significant correlation be-<br />

tween CV-Avg and item statistics since statistics from higher populations are<br />

more reliable. .<br />

The first three post hoc analyses were conducted using chi-square tests of<br />

statistical significance. Even though eight of the 19 career fields with<br />

significant correlation between CV-Avg and DIFF were from the electronic area,<br />

no significant difference was found (p


-~ . -______ -__--..-<br />

REFERENCES<br />

Lawshe, C. H. (1975). A quantitative approach to content validity. Person-<br />

nel Psychology, 28, 563-575.<br />

Implementation of<br />

Perry, C. M., Williams, J, E., and Stanley, P. P. (1990).<br />

content validity ratings in Air Force promotion test construction. Proceed-<br />

ings o.f the 3Znd Annual Conference of the <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong>, 1990.<br />

391


The Air Force Medical Evaluation Test, Basic<br />

<strong>Military</strong> Training, and Character of Separation<br />

Edna R. Fiedler’<br />

Wilford Hall Medical Center<br />

Lackland Air Force Base, Texas<br />

Selection procedures and rapid early intervention are two strategies used<br />

by the United States Air Force to reduce the human and monetary costs of<br />

attrition in the enlisted force. Cognitive measures such as the Armed Services<br />

Vocational Aptitude Battery and the Armed Forces Qualification Test have long<br />

been used effectively for academically based screening. Se1 f reported<br />

biographical data (biodatal and personality measures have been used for<br />

screening noncognitive adaptability.<br />

Armed Services biodata techniques have included the Navy’s Recruit<br />

Background Questionnaire (RBQ) and the Army’s <strong>Military</strong> Applicant Profile (MAP)<br />

and Assessment of Background and Life Experiences (ABLE!. Currently, the Navy,<br />

as Executive Agent, designed the Armed Service Applicant Profile, a combination<br />

of the best items from MAP and RBQ (Trent, Quenette, 6 Laabs, 1990; Laabs,<br />

Trent, 81 Quenette, 19891.<br />

Other studies have used a variety of personality measures to predict basic<br />

military training attrition, While Spielberger and Barker (1979) studied the<br />

relationships of trait and state naxiety on attrition from basic military<br />

training for both Navy and Air Force recruits, Butters, Retzlaff and Gibertini<br />

(19861 used the Millon Clinical Multiaxial Inventory (MCMI) to predict 80% of<br />

mental health clinic recommended discharge versus return-to-duty dispositions.<br />

McCraw and Bearden 11988) have focused on motivational demographic, i;nd<br />

personality test scores to technical training school students referred to a<br />

mental health clinic.<br />

Since the 1970’s, the Air Force has used The Air Force Medical Evaluation<br />

Test (AFMET I to screen out those basic recruits likely to attrite from Basic<br />

<strong>Military</strong> Training, Early work on the development and initial validation of the<br />

instrument included the studies by Lachar (19741, and Guinn, Johnson, and Kenton<br />

(1975). Bloom (1977, 1980, 1983) reported on the ongoing operational aspects of<br />

the program. The interested reader is referred to Crawford ‘8 (19901 review of<br />

the history of AFMET. This study reports on the efficacy of the instrument used<br />

in the first phase, the History Opinion Inventory (HOI), for predicting BMT<br />

performance and Character of Separation. In addition, the Gordon Personal<br />

Profile (Gordon) and the Minnesota Multiphasic Personality (MMPI) are discussed<br />

in relationship to BMT performance and character of separation.<br />

METHOD<br />

Subjects.<br />

The total sample consisted of all USAF enlisted personnel whose total<br />

Active <strong>Military</strong> Service Date was calendar year 1985 through 1989 and who were<br />

also identified by Wilford Hall USAF Medical Center for testing on the AFMET,<br />

or 171,707 subjects (males = 138,601, females = 33,106). The number of<br />

1 Disclaimer: The views expressed in this paper are those of the authcr and do<br />

not neCeSSaPily represent those of the United States Air Force or the Department<br />

of Defense. Acknowledgments: The author thanks Melody Darby and Doris Black for<br />

their assietance in statistical analyses, Calvin Fresne for his assistance ln<br />

data management, and Malcolm Ree, PH.D. for his assistance throughout the study.<br />

a____.--- .-..__ --_<br />

392


REFERENCES<br />

Lawshe, C. H. (1975). A quantitative approach to content validity. Personnel<br />

Psychology, 28, 563-575.<br />

Perry, C. M., Williams, J. E,, and Stanley, P. P. (1990). Implementation of<br />

content val.idity ratings in Air Force promotion test construction. Proceed -<br />

ings of the 32nd Annual Conference of the <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong>, 1990.<br />

391<br />

.


The Air Force Medical Evaluat ion Test, Basic<br />

<strong>Military</strong> Training, and Character of Separation<br />

Edna R. Fiedler’<br />

Wilford Hall Medical Center<br />

Lackland Air Force Base, Texas<br />

Selection procedures and rapid early intervention are two strategies used<br />

by the United States Air Force to reduce the human and monetary costs of<br />

attrition in the enlisted force. Cognitive measures such as the Armed Services<br />

Vocational Aptitude Battery and the Armed Forces Qualification Test have long<br />

been used effectively for academically based screening. Self reported<br />

biographical data (biodatal and personality measures have been used for<br />

screening noncognitive adaptability.<br />

Armed Services biodata techniques have included the Navy’s Recruit<br />

Background Questionnaire (RBQl and the Army’s <strong>Military</strong> Applicant Profile (MAP)<br />

and Assessment of Background and Life Experiences (ABLE!. Currently, the Navy,<br />

as Executive Agent, designed the Armed Service Applicant Profile, a combination<br />

of the best items from MAP and RBQ (Trent, Quenette, 6 Laabs, 1990; Laabs,<br />

Trent, 81 Quenette, 1989).<br />

Other studies have used a variety of personality measures to predict basic<br />

military training attrition. While Spielberger and Barker (1979) studied the<br />

relationships of trait and state naxiety on attrition from basic military<br />

training for both Navy and Air Force recruits, Butters, Retzlaff and Gibertini<br />

(lp861 used the Millon Clinical Multiaxial Inventory (MCMI) to predict 80% of<br />

mental health clinic recommended discharge versus return-to-duty dispositions.<br />

McCraw and Bearden (19881 have focused on motivational demographic, and<br />

personality test scores to technical training school students referred to a<br />

mental health clinic.<br />

Since the 1970’s, the Air Force has used The Air Force Medical Evaluation<br />

Test (AFMET 1 to screen out those basic recruits likely to attrite from Basic<br />

<strong>Military</strong> Training. Early work on the development and initial validation of the<br />

instrument included the studies by Lachar (19741, and Guinn, Johnson, and Kenton<br />

(19751. Bloom (1977, 1980, 19831 reported on the ongoing operational aspects of<br />

the program. The interested reader is referred to Crawford ‘8 (19901 review of<br />

the history of AFMET. This study reports on the efficacy of the instrument used<br />

in the first phase, the History Opinion Inventory (HOI), for predicting 9MT<br />

performance and Character of Separation. In addition, the Gordon Personal<br />

Profile (Gordon) and the Minnesota Multiphasic Personality (MMPI) are discussed<br />

in relationship to BMl’ performance and character of separation.<br />

METHOD<br />

Subjects.<br />

The total sample consisted of all USAF enlisted personnel whose total<br />

Active <strong>Military</strong> Service Date was calendar year 1985 through 1988 and who were<br />

also identified by Wilford Hall USAF Medical Center for testing on the AFMET,<br />

or 171,707 subjects (males = 158,601, females = 33,106). The number of<br />

-------s.------------<br />

1 Disclaimer: The views expressed in this paper are thoae cf the author and do<br />

MJt neceSsarily repreaent thoee of the United States Air Force or the Departffient<br />

Of Defenee. Acknowledgments: The author thanks Melody Darby and Doris Black for<br />

their assistance in statistical analyses, Calvin Fresne for his assistance ln<br />

data management, and Malcolm Ree, PH.D. for his assistance throughout the study.<br />

392


ubjects in each analysis may differ a8 not all subject8 had data on all<br />

variablea.<br />

In8 trumente .<br />

Inetrumente include the HOI, a SO-item, true-false aelf reported history<br />

of legal, antisocial, school, family, and alcohol problem8 with a weighted total<br />

score range of 0 to 30. Higher scores indicate greater endoreement of problems<br />

prior to rrervice, The Gordon ir an 18-item questionnaire in which subjectr murt<br />

choose which is most and least like them. Four scores were obtained: Social<br />

Aacendoncy, Responsibility, Emotional Stability, and Gregariousnees. Scores were<br />

entered as percentiles, ranging from 1 to 00. The MMPI, a measure of<br />

psychopathology, ha8 nine clinical and three validity scales, with raw score8<br />

ranging from one to 58.<br />

Procedure.<br />

The (HOI) wa8 given on the second day of training to all United States<br />

Air Force baeic trainees to identify high risk recruits. The Gordon wa8 given on<br />

the 6-0th day of training, during Phase II testing of identified high risk<br />

recruits. Any recruit referred to a credentialed provider based on phase 11<br />

results was given the MMPI (currently the MMPI-21 prior to a clinical evaluation<br />

by a peychologist or psychiatrist. Only after an evaluation by a p8yChOlOgiSt<br />

was a recruit recommended for discharge or for return to duty.<br />

Analyses include analysis of variance, pooled variance t-test, Pearson<br />

correlation, multiple regression, Cronbach coefficient alpha and the<br />

Wherry-Gaylord estimation of reliability of composites.<br />

RESULTS<br />

History Opinion Inventory.<br />

Reliability as measured by the Wherry-Gaylord procedure for weighted<br />

composite8 was .84. Internal consistency among all the item8 was .57, using<br />

Cronbach’s coefficient alpha. The substantially lower reliability using<br />

Cronbach’s alpha demonstrate that the instrument is multidimensional and the<br />

Wherry-Gaylord is the more appropriate index of reliability.<br />

Table 1 8hOW8 that recruit8 who graduated from BACP had significantly<br />

lower 8core8 on the HOI. than those who were discharged. A correlation analyals,<br />

corrected for unreliability, showed the HOI accounted for 312 lr=.31) of the<br />

predictive efficiency for BMT graduation/discharge.<br />

Character of separation was divided into three groupa: honorable, less<br />

than honorable, and entry level separation. Significant difference8 on the HOI<br />

were found among the types of separation with the entry ievel separation (ELS!<br />

group accounting for the significant difference, a8 seen in Table 2.<br />

A correlation analysis, corrected for unreliability, showed the HOI<br />

accounted for 36% of the predictive efficiency for character of eeparation.<br />

The Qordon Personal Profile Inventory.<br />

Means of the Cfordon rubrcales were significantly different for graduate8<br />

v8 discharges from BMT. Table 1 depicts these result8. As shown in Table 2,<br />

honorable discharge8 had significantly different average scorss on the four<br />

subaca’lea compared to ELS. There wa8 a nonsignificant trend for le88 than<br />

honorable discharge average score8 to be lower than honorable and higher than ELS<br />

fez- all eubrcales, Due to the small number of recruits who have So far taken the<br />

Gordon and received leae than honorable diahcarge (N=I4) 1 these results were not<br />

reported in the t.abie.<br />

393


MEASURE<br />

Table 1<br />

HOI, Gordon, and BMT Performance<br />

MEASURE GRADUATED DISCHARGED F T-TEST<br />

HOI (N=158,671) (N=12,0111<br />

MEAN 3.1702 5.0417<br />

SD 2.804 4.868<br />

GORDON<br />

SOCIAL ASCENDANCY 11=2760)<br />

MEAN 57.0000<br />

SD 32.318<br />

RESPONSIBILITY<br />

MEAN<br />

57.0000<br />

SD 32.430<br />

EMOTIONAL STABILITY<br />

MEAN 46.1513<br />

SD 32.300<br />

(N=O20)<br />

26.2413<br />

20.780<br />

22.5359<br />

26.703<br />

14.5707<br />

23.829<br />

SOCIAL GREGARIOUS<br />

MEAN 51.1224 25.8902<br />

SD 31.755 28.861<br />

*(I( p < -01 **w p ( .OOl<br />

3.01*** il. la***<br />

1.18** 26.63*ww<br />

1.47rur 32.c91**<br />

!.B5rr* 31.64r**<br />

1.21*** 22.39rrw<br />

Table 2<br />

HOI, Gordon, and Character of Separation -.-- .-<br />

HONORABLE LTH’ ELS<br />

F<br />

HOI (N=21,641) iN=6081 (N=15,603)<br />

MEAN 3.42s 3.66* 5.56<br />

SD 3.04 3.08 4.64<br />

GORDON (N=3331<br />

SOCIAL ASCENDANCY<br />

MEAN 52.04w<br />

SD 33.24<br />

RESPONSIBILITY<br />

MEAN 54.02s<br />

SD 32.66<br />

EMOTIONAL STABILITY<br />

MEAN 43.67*<br />

SD 32.54<br />

(N= 1045)<br />

20.63<br />

31.70<br />

25.76<br />

28.98<br />

17.45<br />

26.08<br />

1462.15***<br />

61.06***<br />

112.69***<br />

112.56***<br />

SOCIAL GREGARIOUS<br />

MEAN 45.38~<br />

28.56 37,7o*rr<br />

*<br />

SD 32.81 ____- _ _--. 30.13 .__,____ .__._..._. --_--significantly<br />

different from ELS, p < .OOl it** p < .OOOI<br />

_.-...-<br />

1 These results are not reported for the Gordon because only 14 recruits hew<br />

taken the Gordon and received a Less than Honorable Discharde.<br />

ELS = Entry Level Separation LTH = Less Than Honorable<br />

394


-.- -. -.-- -.<br />

The Minnelrota Multiphreic Inventory (~~11.<br />

Table 5 ahowo the mean8 and standard deviation8 for the validity and<br />

clinical scale8 of thr hMP1 by gender and BMT performance. For maleis, average<br />

differences across all rcales were statistically significant (p.( .OOll and T<br />

profiles were clinically meaningful. For females, there were no significant<br />

differencea on one of the validity indexes, L, or on scale 8, mania. All other<br />

measured indices were significant at the .Ol level.<br />

Table 4 shows the means and standard deviations for the validity and<br />

clinical scalee of the Ml&!1 by gender and character of separation. For malee only<br />

Scales 8, L, and K did not significantly distinguish between ELS and honorable<br />

discharge ( p. < 0011. For females, there were no significant differences among<br />

Table 3<br />

MMPI and BMT Performance<br />

SCALE GRADUATED DISCHARGED T GRADUATED DISCHARGED T<br />

_,<br />

(N=7341 (N=6881 (N= 102) (N=I 181<br />

L MEAN 4.35 3.62<br />

SD 2.31 2.50<br />

F MEAN 10.20 17.24<br />

SD 7.00 9.49<br />

K MEAN<br />

SD<br />

11.63<br />

4.07<br />

9.93<br />

3.97<br />

HS MEAN 9.86 15.62<br />

SD 6*80 7.52<br />

D MEAN 24.15 31.20<br />

SD 7.61 7.74<br />

Hy MEAN 21.79 27.08<br />

SD 5.98 6.58<br />

Pd MEAN 23.43 27.37<br />

SD 6.11 6.05<br />

Mf MEAN 24.96 27.18<br />

SD 5.11 5.02<br />

Pa MEAN 13.36 17.34<br />

SD 5.28 5.65<br />

Pt MEAN 22.52 31.39<br />

SD 11.76 10.45<br />

SC MEAN 23.48 35.15<br />

SD 13.55 14.38<br />

Ma MEAN 21.32 22.42<br />

SD 4.72 5.40<br />

5.9a** 4.06 3.56<br />

2.29 2.23<br />

15.83** 9.08 15.64<br />

6.06 9.23<br />

7.24** 11.92<br />

5.01<br />

10.15<br />

4.39<br />

15.11** 11.21 17.44<br />

6.78 7.82<br />

17.31** 35.46 32.99<br />

6.16 7.99<br />

15.83** 24.20 29.42<br />

5.68 6.67<br />

12.25** 24.28 27.31<br />

5.88 6.41<br />

8.27** 34.94<br />

5.15<br />

36.92<br />

4.53<br />

13.69** 13.11 16.34<br />

5.07 4.98<br />

15.07** 24.03 31.75<br />

11.45 10.63<br />

15.72** 24.33 33.76<br />

12.41 15.20<br />

4.09** 21.24 21.46<br />

4.65 4.50<br />

1.64<br />

6.30**<br />

2.76*<br />

6.33**<br />

7.88**<br />

6.28**<br />

3.66**<br />

3.01*<br />

4.75**<br />

5.16**<br />

5.06**<br />

Si MEAN 32.78 41.95 13.33** 32.74 42.61 5.71**<br />

SD<br />

* p < .Ol<br />

13.29<br />

** p < .OOl<br />

12.64 12.20 13.44- - - -<br />

395<br />

0.36


SCALE<br />

L MEAN<br />

SD<br />

F MEAN<br />

SD<br />

K MEAN<br />

SD<br />

He MEAN<br />

SD<br />

D MEAN<br />

SD<br />

Hy MEAN<br />

SD<br />

Pd MEAN<br />

SD<br />

Mf MEAN<br />

SD<br />

P a MEAN<br />

SD<br />

Pt MEAN<br />

SD<br />

SC MEAN<br />

SD<br />

Ma MEAN<br />

SD<br />

Si MEAN<br />

SD<br />

.-.-- .-_-<br />

Table 4<br />

hMP1 and Character of Separation -<br />

KALES<br />

HONORABLE ELS F<br />

IN=1451 (N=735)<br />

4.2897 5.6857 4.43<br />

2.1243 2.2646<br />

10.6138 16.8395 28.7291**<br />

6.7373 9.4024<br />

11.2690 10.0027 6.5935@<br />

4.6685 4.0420<br />

9.0552 15.3537 45.3225**<br />

6.0882 7.5637<br />

23.2759 30.8204 57.9092**<br />

6.7757 7.0753<br />

21.3241 26.8381 44.9178*#<br />

5.7275 6.5806<br />

23.4897 27.2327 23.6745~~<br />

5.6349 6.0954<br />

24.3862 27.0544 17.4523~~<br />

4.9261 5.1062<br />

12.8552 17.1537 35.4004**<br />

5.4250 5.6698<br />

21.1034 30.8762 50.8987**<br />

10.5795 10.7211<br />

22.6207 34.6014 43.2064*#<br />

12.7247 14.5042<br />

21.5793 22.4503 1.8347<br />

4.5760 5.3862<br />

32.8062 41.3429 28.6511**<br />

11,4038 12.9246<br />

Y = p < .Ol ** = p ( .OOOl<br />

396<br />

FEMALES<br />

HONORABLE ELS F<br />

(N=26) (N=126)<br />

3.9231 3.5317 0.39<br />

1.9167 2.2043<br />

10.0385 i5.2063 4.7042<br />

5.0713 9.1933<br />

10.4231 10.2063 1.7106<br />

3.4195 4.3567<br />

11.5385 17.1905 6.3549*<br />

5.7218 7.8798<br />

26.5000 32.5873 6.9555W<br />

5.4498 8.0987<br />

24.1538 29.2540 6.766CW<br />

5.7667 6.7539<br />

25.6538 27.0079 2.2758<br />

4.0094 6.5976<br />

35.4231 36.8254 1.0651<br />

4.7428 4.5274<br />

12.8462 16.2063 5.6570~<br />

4.5316 5.0045<br />

25.7692 31.4444 4.3895<br />

10.0332 10.7804<br />

25.5385 33.3571 4.0622<br />

11.0099 15.1385<br />

20.6923 21.4762 1.3661<br />

3.6306 4.5532<br />

36.8846 41.9683 2.0088<br />

10.8271 13.6510


the scales based on character of separation. As only eleven males and one female<br />

who had taken the MMPI had received a less than honorable discharge, this category<br />

was not included in the analysis.<br />

CONCLUSIONS<br />

It is concluded that the HOI as the first part of a psychiatric screening<br />

inventory to predict BMT performance is both reliable and valid. It also predicts<br />

character of separation, effectively contrasting those who receive entry level<br />

separations from those who are honorably discharged or those who are less than<br />

honorably discharged.<br />

Current research on the AFMET will determine the predictive validity,<br />

reliability, and clinical meaningfulness of all aspects of AFMET In<br />

relationship to Basic <strong>Military</strong> Training, technical school performance,<br />

unfavorable information, eligibility for promotion, and character of<br />

separation. Based on these findings the AFMET will be revised and refined to<br />

increase predictive and clinical efficacy.<br />

REFERENCES<br />

Bloom, W. (1977) Air Force Medical Evaluation Tests<br />

Digest, 2&, 17-20.<br />

. USAF Medical Service<br />

Bloom, W. (IQ801 Air Force Medical Evaluation Test (AFMETl Identifies<br />

Psychological Problems Early. USAF Medical Service Digest, 31, 8-Q.<br />

Bloom, W. (19851. Changes made, lessons learned after mental health<br />

screening. <strong>Military</strong> Medicine, 148, 889-890.<br />

Butters, M., Retzlaff, P., & ffibertini, M. (19861. Non-adaptibility to basic<br />

training and the Millon Clinical Multiaxial Inventory. Mi 1 i tary<br />

Medicine, 151, 574-576.<br />

Crawford, L. (19901. Development and Current Status of USAF Mental Health<br />

Screening. Manuscript submitted for publication.<br />

Ctuinn, N., Johnson, A., & Kenton, J. (19751. Screening for Adaptability to<br />

<strong>Military</strong> Service (AFHRL-TR-75-301, Brooks AFB, TX: Training Systems<br />

Division, Air Force Human Resources Laboratory.<br />

Laabs, Q., Trent, T., & Quenette, M. (1989). The adaptability screening<br />

program: an overview. Proceedings of the 318t Annual Conference of the<br />

<strong>Military</strong> <strong>Testing</strong> <strong>Association</strong>, 434-439.<br />

McCraw, R., & Bearden, D. (19881. Motivational and demographic factors in<br />

failure to adapt to the military. <strong>Military</strong> Medicine, 6, 325-328.<br />

Spielberger, C. & Barker, L. (19791. The Relationship of Personality<br />

Characteristics to Attrition and Performance Problems of Navy and Air<br />

Force Recruits (Contract No. MDA 903-77-C-0190). Orlando, FL: US Navy<br />

Training Analysis and Evaluation Uroup.<br />

Trent, T, Quenette, M., & Laabs, cf. (1990, August). An Alternative to High<br />

School Diploma for <strong>Military</strong> Enlistment Qualification. Paper Presented at<br />

the Q8th Annual Convention of th American Psychological <strong>Association</strong>,<br />

Boston, MA.<br />

397


Implementation of the Adaptability Screening Profile (ASP)*<br />

Thomas Trent, Mary A. Quenette, and Gerald J. Laabs<br />

<strong>Testing</strong> Systems Department<br />

Navy Personnel Research & Development Cente?<br />

San Diego, California<br />

At last year’s MTA symposium concerning the implementation of a biographical<br />

instrument (Adaptability Screening Profile/ASP) into military enlistment screening (Sellman,<br />

1989), we described technical issues (Trent, 1989), data analysis plans (Waters & Dempsey,<br />

1989), a methodology for controlling item response distortion (Hanson, Hallam & Hough,<br />

1989), and plans for accelerated implementation (Laabs, Trent & Quenette, 1989). While we<br />

made considerable progress towards these stated goals, the operational start and field test o.f<br />

the ASP has been postponed while the Armed Services review implementation options. This<br />

paper summarizes ASP objectives and updates the research results. In addition, unresolved<br />

implementation issues and preliminary plans for the development of a new Department of<br />

Defense (DOD) enlistment screening algorithm are described.<br />

The Problem Revisited<br />

Since WorId War II, the Services manpower and personnel research laboratories have<br />

conducted research on a variety of biographical and other noncognitive assessments for<br />

personnel screening (Laurence & Means, 1985). Nonetheless, the quota restriction that the<br />

Services place on the proportion of non high school graduates has operated as the primary<br />

attrition controlling screen. As an increasing number of high school “dropouts” earn<br />

alternative education credentials (e.g., adult school, high school equivalency certificate,<br />

certificates of attendance, and occupational programs), the U.S. Congress and advocacy groups,<br />

such as the American Council on Education, have requested DOD to augment educational<br />

enlistment criteria with a screening instrument that measures attributes of the individual<br />

applicant that are related to adaptation to military life and the probability of completing initial<br />

obligated service.<br />

Opposition to basing enlistment eligibility on educational group membership has<br />

intensified since a 1987/1988 DOD classification of educational credentials into three eligibility<br />

tiers. Table 1 shows that attrition during. the first year of enlistment varies considerably across<br />

and within the tiers by type of education credential. While Tier I applicants are given highest<br />

priority for enlistment3, the attrition rates for adult schoolers (23.6%) and recruits with one<br />

‘Paper presented at the 32nd Annual Conference of the <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong> at Orange<br />

Beach, Alabama, November, 1990.<br />

‘The opinions expressed in this paper are those of the authors, are not ofliciai, and do not<br />

necessarily represent those of the Navy Department.<br />

‘The relutively small numbers of Tier II & Tier III non high school graduate applicants who arc<br />

selected must also score considerably higher on the Armed Services Vocational Aptitude Buttcrv.<br />

398<br />

.


school diploma graduates (10,6%, 14.3%, and 13.5%. respectivelY).<br />

Procedures<br />

Table 1<br />

Twelve Month Attrition Rates by Education Lcvcl<br />

DOD Fiscal Year 1988 Accessions’<br />

Tie&ducation Lcvcl<br />

Tier I<br />

Hig$?%ool Graduate<br />

College<br />

One Semester<br />

2 Yrs or more<br />

Adult Education<br />

Tier II<br />

H.S. Equivalence Certificate<br />

Oct. Program Certificate<br />

H.S. Certificate of Attendance<br />

or Completion<br />

Correspondence<br />

Home Study<br />

Number of Percent<br />

Accessions Attrition<br />

235,388 13.5<br />

2,092‘ 21.1<br />

6,228 8.0<br />

275 23.6<br />

9,843 23.8<br />

98 14.3<br />

1,018 19.8<br />

87 24.1<br />

47 10.6<br />

Tier III<br />

No H.S. Diploma 5,350 26.6<br />

‘Non-prior service, active duty, E = 260,426.<br />

Two alternate forms of the ASP (Part 1) were developed, each consisting of SO items<br />

in multiple choice format with two to five response options. The items sampled construcIs<br />

representing delinquency, ‘academic achievement, career and work orientation, athletic<br />

involvement, and social adaptation. Item option scoring weights were developed utilizing<br />

Guion’s (1965) “horizontal percent” method in a randomly assigned scale construction sample<br />

(N = 26,857, Army, Navy, Air Force, and Marine Corps combined samples). This resulted<br />

in a three-point item scale and a single total summed score. In a national sample of military<br />

applicants (N = 120,175), the mean item reliability (item to rota1 score correlation) was .2 1<br />

and the estimates of internal consistency were .76 and .74 (coefficient alpha for the two<br />

forms). The predictive validity of the ASP was compared to the folIowing measures: Armed<br />

Forces Qualification Test (AFQT); education credentials (2 years college, high school diploma,<br />

high school equivalency certificate/GED, and no secondary credential); employed at time of<br />

application; 17 years of age at time of service entry; and eligibility waiver status as a result<br />

of preservice misdemeanor or felony arrests.<br />

The criterion is a dichotomous measure of attrition. Personnel who were voluntarily<br />

or involuntafily dishaged from serbice prior to the completion of their service contracts were<br />

coded ” 1”. Those personnel with medical disability, officer school discharges, service breach<br />

of contract, and the dead were excluded from analysis. All other personnel were coded as “0”.<br />

399<br />

.


The criterion is a dichotomous measure of attrition. Personnel who were voluntarily<br />

or involuntarily discharged from service prior to the completion of their service contracts were<br />

coded “I “. Those personnel with medical disability, officer schoo! discharges, service breach<br />

of contract, and the dead were excluded from analysis. All other personnel were coded as “0”.<br />

The biodata instrument (ASP-I) was administered to all active duty military applicants<br />

in the United States for a three month period (N = 120,175). The sample utilized in the<br />

following analyses consisted of 55,675 personnel who enlisted after the applicant<br />

administration. The applicant and accession samples were generally representative of military<br />

populations (Trent, Quenette, Ward & Laabs, 1990).<br />

Results<br />

points.<br />

Figure 1 graphically portrays average attrition rates at each of the biodata raw score<br />

I<br />

I<br />

0<br />

LI.,, ,,,,, ,__ .,,,,:,l,,,,,*,,,,,,,,,,,l,,,/,,jl<br />

m Do m 100 tm 110 11, 1-m tra<br />

ASAP Raw Score<br />

130 tl5<br />

Ftgure 1. Attrition rates by ASP-l score<br />

Table 2 shows the simple and incremental validities with the biodata score (ASP-l)<br />

forced into the regression equation last. This analysis was performed on a random one-half<br />

of the sample (“model construction” group; E. = 26,991). Aside from ASP-l and AFQT, the<br />

predictor variables were dummy coded. Validities for high school diploma, two or more years<br />

of college, AFQT, age 17, no credential, GED, and ASP were corrected for restriction of range<br />

using a univariate formula (Thorndike, 1982). Validities for employment status and<br />

misdemeanor/felony were not corrected because operational selection procedures resukd in<br />

larger accession sample variances as compared to applicant sample variances.<br />

The trJe<br />

unrestricted variance of the misdemeanor/felony measure is unknown since most potential<br />

applicants in this category are screened out at the recruiter level and do not reach the applicant<br />

testing stage.<br />

400<br />

.


criterion.(-.09).<br />

Variabk?<br />

Table 2<br />

ASP-l Incremcntnl Validity - DoD Sample’<br />

Zcro- Incremental<br />

order Multiple Change<br />

I” r,’ R R2 F P<br />

Srepc<br />

1. HS Diploma -.14 -.I9 .14 .021 565.0 .OOO<br />

2. 2 Years College -.03 -.04 .17 .030 272.9 .ooo<br />

3. Employed -.09 .19 .O36 152.5 .ooo<br />

4. AFQT Pcrccntile -.06 -.07 .20 .039 92.8 .ooo<br />

5. MisdemcanorlFclony .04 .20 440 25.3 .ooo<br />

6. No Credential .I3 .17 .20 44 1 23.1 .ooo<br />

7. GED .09 .lO .20 .041 13.8 .ooo<br />

8. Age 17 .05 .07 .20 .042 9.2 .002<br />

9. ASP-l -.25 -.27 .27 ,073 912.0 .ooo<br />

‘DOD Accessions, Model Construction Group, B = 26,991.<br />

bAIl predictor variables arc indicator variables (dummy O/I coded)<br />

cxccpt ASP-l and AFQT scores.<br />

“All correlations are significant at .05 Icvcl.’<br />

dCorrclations (validities) corrected for restriction of rang (univariatc<br />

correction; Thorndike, 1982)<br />

‘Order of entry of variables in steps 1-8 was dctcrmincd by prior stcpwisc<br />

proccdurc. ASP-l was forced into the equation last.<br />

Conclusions and Implementation Issues<br />

In the research mode, the use of the Adaptability Screening Profile for enlistment<br />

screening demonstrated incremental validity in addition to operational screens and other<br />

potential measures to minimize attrition and to improve the match between the demands of<br />

military service and the background and temperament of individuals. The utility of employing<br />

the ASP will vary as a function of the selection ratio* and the stability of the ASP in<br />

operational mode (see Trent, et al. 1990 for a more complete discussion of ASP utility).<br />

The research results support the contention of the American Council on Education that<br />

alternatives to the existing three-tier educational quota system are technically feasible. On<br />

.the other hand, educational attainment has a proven track record of good predictive validity<br />

and is in fact one of the most reliable of the biographical measures. From a technical<br />

perspective, type of education credential should be included in an array of adaptability<br />

indicators that samples the “whole person.” The approach of the ASP research program has<br />

been to operationalize constructs related to individuals’ adaptability to institutions in general<br />

and the likelihood of persistence in military trainin, (7 and occupations in particular. The<br />

biodata score resulting from the ASP is an economical method of capturing personal<br />

background data. In addition, a new research effort is underway at the Navy Personnel<br />

‘The proportion of qualified recrl& needed to meet manpower goals to the foul numlxr of’<br />

military applicants.<br />

401<br />

.


Research and Development Center and the Human Resources Research Organization to<br />

construct a DOD attrition prediction model that could be used in a “compensatory” enlistment<br />

eligibility system (Laurence & Gribben, 1990). In such an algorithm the applicant’s qualifying<br />

score would be determined by a combination of measures such as aptitude test scores and<br />

personal background data, including educational achievement, criminal justice history, and<br />

employment history. The validity of this proposed screening model, as well as plans for DoD<br />

implementation, is planned for presentation at next year’s MTA conference.<br />

Two related issues have stalled the field rest of the ASP. In that the principal objective<br />

of the operational test was to evaluate the performance of the self-reported biodata in an<br />

operational mode, eligibility cutting scores were established to eliminate the bottom 10 percent<br />

of otherwise qualified applicants. This was a necessary condition to gain a realistic<br />

environment of recruiter coaching and applicant dissimulation to test for operational score<br />

inflation and possible validity degradation. The prospect of rejecting high school diploma<br />

graduates, especially in the upper “mental groups,” proved to be extremely unpopular among<br />

the Services. Secondly, the DOD is considering the feasibility of avoiding the “multiple<br />

hurdle” impact of the ASP field test by implementing the instrument within the new<br />

compensatory screening algorithm that is under development. Thus, the initial efficacy of the<br />

ASP would rely upon validity estimates from the non-operational administration (E = 120,175).<br />

Until score monitoring provides operational data, the uncertainty about the impact of recruiter<br />

coaching and applicant “faking good” on score distributions and predictive validity will remain<br />

unresolved. At present, the ASP relies upon empirical scoring and verification warning<br />

statements to minimize score inflation. Moreover, experimental studies (e.g., Trent, Atwater<br />

& Abrahams, 1986; Trent, 1987; Hough, Eaton, Dunnette, Kamp & McCloy, 1990) indicate<br />

that the problem of item response distortion is minimal. That is, applicants’ responses do not<br />

demonstrate extreme distortion and validities of biodata instruments are not seriously moderated<br />

by distortion.<br />

REFERENCES<br />

Guion, R. M. (1965). Personnel testing. New York: McGraw-Hill.<br />

Hanson, M. A., Hallam, G. L., & Hough, L. M. (1989, November). Detection of response<br />

distortion in the Adaptability Screening Profile (ASP). Paper presented to the 31st Annual<br />

Conference of the <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong>, San Antonio, Texas.<br />

Hough, L. M., Eaton, N. K., Dunnette, M. D., Kamp, J. D., & McCloy, R. A. (1990).<br />

Criterion-related validities of personality constructs and the effect of response distortion on<br />

those validities. Journal of Applied Psychology, 75 (5).<br />

Laabs, G. J., Trent, T., & Quenette, M. A. (1989, November). The Adaptability Screeninp<br />

Pro.gram: An Overview. Paper presented at the 31st Annual Conference of the Militaq<br />

<strong>Testing</strong> <strong>Association</strong>, San Antonio, Texas.<br />

Laurence, J. H., & Means, B. (1985, July). A description and comparison of biographical<br />

inventories for military selection. (FR-PRD-85-S). Alexandria. VA: Human Resources<br />

Research Organization.<br />

402


Laurence, J. H., & Gribben, M. A. (1990, July). Militarv selection strateeies (FR-PRD-90-<br />

15). Alexandria, VA: Human Resources Research Organization.<br />

Sellman, W. S. (1989, November). Implementation of biodata into militarv enlistment<br />

screening. Symposium presented at the 31st Annual Conference of the <strong>Military</strong> <strong>Testing</strong><br />

<strong>Association</strong>, San Antonio, Texas.<br />

Thorndike, R. L. (1982). Anplied psvchometrics. Boston, MA: Houghton-Mifflin Company.<br />

Trent, T. (1987, August). Armed forces adaptabilitv screening: The problem of item response<br />

distortion. Paper presented at the American Psychological <strong>Association</strong> Convention, New<br />

York, NY.<br />

Trent, T. (1989, November). The Adaptability Screening profile: Technical Issues. Paper<br />

presented at the 31st Annual Conference of the <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong>, San Antonio,<br />

Texas.<br />

Trent, T., Atwater, D. C., & Abrahams, N. M. (1986, April). Experimental assessment of item<br />

response distortion. In Proceedings of the Tenth Psycholonv in the DOD Symposium.<br />

Colorado Springs, CO: U.S. Air Force Academy.<br />

Trent, T., Quenette, M. A., Ward, D. G., & Laabs, G. J. (1990). Armed Service Applicant<br />

Profile (ASAP): Development and validation (in review). San Diego, CA: Navy Personnel<br />

Research and Development Center.<br />

Waters, B. K., & Dempsey, J. R. (1989, November). Development of the Adaptability<br />

Screeninp Profile score monitoring svstem. Paper presented to the 31st Annual Conference<br />

of the <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong>, San Antonio, Texas.<br />

403<br />

.


FOR U.S. NaVY TYF’I NG F’EHFOI~MANCE TESTS<br />

MASTER CHX EF YEOMAN STEVE D. MCGEE, USN<br />

NAV;,;L EDUCAT I ON TRA IN X NG PI:‘;OGF\‘AM MANAGEMENT SUPF’OHT ACT IV I TY<br />

The Department of the Navy is considering the use of: word<br />

processnrs/personal computers in typing per+ormance tests.<br />

F’j--e-ajpf~t 1 y .tp,ese te%sts are accomplished LttiliZing elect.ric<br />

kypewri ,ters. l-hi LL~ report presents resul. ts of a stt.ldy to<br />

determi ni? the f easi bi 1 ity of using word pr-ocessor-~/persi~rlal.<br />

computers versL1s the electric typewriter for typing perf:or inance<br />

tests.<br />

Purpose<br />

The put-pose of the study is to determine if typing<br />

perfc3rmanc:e tests could be pcrf ormed with word<br />

proc~~~,c.)r-~~/persc~nal computers thereby speeding up word product i on<br />

AS We1 1. &5; ~~l:C:Uri+.Cy.<br />

MethodoX oc~y<br />

SLI~.I j cct i) wet-k2 J51’3 en1 i r;ted U. S. Navy personnel (f_-1 tf.,r-L, E:-..<br />

6) within the administrative and supply communities that raqulre<br />

typing per+orrri~knc:e tests. Sub jtllct:!s were rcindomly se1 ected from<br />

throughout the Navy. Al 1 0.1: t:l~e sub jectc-; haci prior key board<br />

experience on the typewriter and 2t5 subjects had experience with<br />

the word proce~sor/per~onal computer in thei r normal day-,to-day<br />

work .<br />

Twa c;fficj.~~l U.S. Navy typing performance tesis cprtict.:!cil)<br />

were used +r-urn series 87 published by NETF’MSA. The standard<br />

t?li?ctric:: typewriter (IBII Selectric, Selectric II, and Selectric<br />

III) were utilized +or the typewriter portion o-i: both exams krhile<br />

‘ t h e wet-cJ C‘,~ocl:t?ssar/persi:)~1al ccJlTlpL~tt?r porti cjr-1 Of the fi:.:Jms kiel"k<br />

iidmi r1.i st.ered usi ug the XEEOX 13621, IBM FC, WANG F’C, ZENITH 245,<br />

i-c I1 cl cf P T .<br />

On day one, the subjects were admini stered Tes,t A and timed<br />

for five minutes Losing the typewriter-. On the same day, they<br />

were adrrlirrlc,t.ered Test. A and .ti med fur. five minutes Losing thp<br />

word processor/personal computer - Or\ day two, procedureci wer-e<br />

r’evorse, for e:.:ample, the subjects were ,xdmi r,i c,,l:ered ‘rest I? and<br />

timed +ot- five minutes using the word processor/per~onaL<br />

C~OillpL.lti?r~. They khen were tested using the typewrltet-. Scot-i nq<br />

hiis dcjrit? b y 1. ine at a rate r:,t: f ~.ve 1::ey stroI-::es per word wl th t71tiCk<br />

c- r r 0 r 5., I .t ti 5; ‘t. r t* c .t i r‘l g 5 (?I 1:: e y 5 t. r (3 1.:: c? 5 i: r 08-n t he .t 13 t a 1 5 t r- 0 I:. ~2 !is .<br />

_rm .>*.; .-.. L_-..r_l._ ,_.__.


Subjects were al lowed to use the automat i c wrap--around (autcmat1c<br />

return 1 and backspace ieatures o n the word processor/per~~onal<br />

computer. Al 1 tests were propel-1 y mC2n.i tor-ed by local command<br />

5;c.cper vi 5tr~:, .<br />

Rl.?sul t.s<br />

The results of the administered tests showed that the.<br />

.3Vet-‘age &Or-d pet- minute (WF’M) production was 42.5 using a word<br />

proce:2sor/personal computer and 35.5 WPM using a11 electric<br />

typewri t.er. Table 1 is an illustration of these results by T.tiat .,<br />

A and Test E. fi repeated measurec3 analysis of VariNiCe ruas<br />

cor~duct.ed on the word processor/per5onal computer and typowri ter<br />

words per minute data. Post-hoc test 011 the data indicate thhtr<br />

(a) for- test A the subjects perfor’mecl si yni ,+i cantly better on the<br />

word processor/personal computer than on the electric typewriter,<br />

(b) for test B the subjects performed si gni+i cant1 y bettar on the<br />

word pl-o(_e~j.~or/persollal computer than on the el ectr i c typewr i ter ,<br />

!c) f o r either test sub ject:i performed equal 1 y wel 1 using the<br />

word processor/personal computer ‘I and (d) using the typewriter<br />

subjects pet-f armed signi+icantly bet.ter o n test E -than on tecjt ii.<br />

Thi 5 is i 11 ustrGttrd at Table 2 Showing a breakdown by paygrade.<br />

The test rv:sul ts demonstrate that productivity is increased<br />

b y an aver&ge of 7 WPM using a word proces*sor/personal computer.<br />

Therefore, it would prove advantageous .For the U.S. Ni3Vy t0 allOW<br />

the word PrcjcC5sCjr/Per~onal computer to be uz:,ed for typing<br />

~erfOrmGnCS tests. Additionaly, it id3 recommended that the word<br />

per rrli rlute rf2qui rement be i rrc::rrased by .2X fur the Yeoman rating<br />

since they are t-he individuals who accomplish the rnhjority of the<br />

Navy’ 5 text typing as opposed to form typing. Furthermore, if<br />

the word production for the Yeoman rating is increased by .2X,<br />

then the wrap-around and backspace features Should be authorized<br />

since ti-tics 1s a f eatut-e tf~at i c; uti 1 i zed 01-t a day--to-day basi!s by<br />

t.hese typists.<br />

405


--- --.----- _.-.. ..-- . ..-_ ---__ ---.<br />

”<br />

i 2..-’ ,:. i ,..z ..,. i’ ,,..” f” .,.. / ,.i” ,,,X ,,,. .” ,,.i’ ..,. .“’ ,,.” /’ ,,.. .’ :” ,, .” ,,.<br />

).. ;;.. .,,,.i ,#. .,.’ -.” ,.: .?’ *..’ -..’ _.: . . . ..’ _... : ;.’ ,d .’ i _’ ”<br />

t ;,. -,,, ,;. .,.. .- ;.,.’ ,...’ . ..‘i ,...’ /’ ;’ ,,.. .,- ,...’ . . . . . . . . . ..’ ,..” I’- :’ :’ ;.-:’<br />

.i<br />

y /. / /. .’ . ..’ / ,,/.l ....‘,. _/ /’ ,,,.” ,;.. .,’ ,.,.. i’ ;-:.‘. ,.: :’ ,.‘.. .. ,; ? : ” ,/. i _,.<br />

t, /‘/..I /....‘~y; /. (,.... ./ / .’ ,... 2 ,/” . ...” ,... .. ‘. .,,. .. ‘. . . . .” ..f ;.l ” .” ,_.: .., ,, ., ,... .j ,,,_ ..: .. ,,_ ,. . . . . ;; ,.: ” ,,<br />

/’ ,,.-’ ;,. ,’ ;.-,’ ,.i .,.- .,i .’ ..’<br />

,. /. / ,,.. ,.. ,,. . . : : i . ...’ .’ / ,.’ ,. -. ..f<br />

;.i ,, .’ ,,. ,_.’ ,: ,,. ,.. . ..’ ,:’ ,;.’ (...’ ..’ .’ ‘. ,_. : .’ ; .’<br />

i ‘.., .., . ..,,, ‘... ‘... “... ‘.,, ‘..,, ‘\ “... “.*, “.. ‘..., ‘, ..,. ” ..,.<br />

. . . . .,, . . . . ‘., ‘.., . . ‘..<br />

., ., ‘.. ‘. ‘, ..,, ‘.., x., ‘.<br />

i ‘. ..,, ‘. . . . . ., ‘,, x., ‘..,, ‘..,, . \ % ..,, ‘.,, . . . . . ‘\; ‘. .,,,<br />

“..,, ‘... ‘.,, 1.<br />

. .<br />

‘.. ‘... c / .,<br />

p., ‘L.<br />

‘.., . . . . ‘.. %. . . . . . ‘.<br />

i . . . . i., . . . . . . . i., % ‘.,. . ‘\ -. ‘.. x. L ‘,.. . . . . ., .( ., “., .‘.,, ‘.-.,<br />

: %. “..., ‘. . . . . -. .\ ‘. ..,, ..,, ‘. ‘.., ‘. . ..., ‘5 ,, .‘..,, ‘... ‘...,<br />

‘i.<br />

,,<br />

. . . . ‘. ..,, .,., .., .,, .., . . . . ‘... ‘. ..(,<br />

‘, . . .<br />

I. ‘, ..,. ..,_ “.. . .-., k. ‘., -.., .,,, .., ‘.. “..,. ‘5 x., ‘l,, ‘L., 4, I ., c. . . ‘.,, .,, ._<br />

-.. ‘k., ‘-., . . ..(<br />

! .,,, ._, . . . %..<br />

‘..,<br />

i.... ‘...., ‘?. ‘.., ‘... “... ‘x ‘.., ‘...<br />

‘... . . . . “... .., ‘..., ‘.. “.. ‘. ._(, ..,, ‘.._.,<br />

! ‘..,, ‘i.., “.... ,, ‘..<br />

“.. “.., ‘.., . ‘...., ‘. . . . . ‘., ._ . . ‘.. . . ‘. .;. x., ‘... i<br />

“..., ‘. ..,, .., ‘..,<br />

‘...<br />

‘. ‘..,<br />

..,, .; ‘.. ‘%. ‘..,<br />

;. ‘. .,.., ‘. ..; . . . . i . . . . . . .<br />

‘k; ‘X., .%.. . . . . . . . . . .<br />

j ‘;, ., ‘.. .,<br />

‘. ..,, .. \, 5 c., ..,<br />

‘XC<br />

.,, “‘. x., x.. ‘.., . ,, . >. ‘.... . . ‘; . . . s_ %., . . . . k., :..,<br />

‘., ._ ‘L, ,, ., .‘..<br />

406


--<br />

I


Acute High Altitude Exposure and Exercise<br />

Decrease Marksmanship Accuracy<br />

W.J. Tharion, B.E. Marlowe, R. Kittredge,<br />

R. Hoyt and A. Cymerman<br />

United States Army Research Institute of Environmental Medicine<br />

Natick, Massachusetts 01760<br />

ABSTRACT<br />

Many moderate to high altitude areas occupy militarily .<br />

strategic parts of the world. This study quantified the<br />

effects of endurance exercise, acute altitude exposure (AAE)<br />

and extended altitude exposure (EAE) (16 days at 4300 m), on<br />

marksmanship performance. Sixteen experienced male marksmen<br />

fired a de-militarized M-16 rifle equipped with a Noptel ST-<br />

1000 laser system from a standing unsupported position at a<br />

2.3 cm diameter circular target from a distance of 5 m.<br />

Subjects were tested at rest and after a maximal 20.4 km<br />

run/walk ascent from 1800 m to 4300 m, following AAE and EAE.<br />

Sighting time (the interval between a signal light to fire and<br />

trigger pull) and accuracy (distance of shot impact from<br />

target center) were measured. Exercise and time at altitude<br />

had independent effects on marksmanship. Sighting time was<br />

unaffected by exercise, but was 8% longer following EAE (5.61<br />

+ 1.25 set AAE vs 6.06 ?I 1.06 set EAE (mean i SD), ~C.05).<br />

Accuracy was reduced 11% by exercise (3.63 ?: 0.69 cm at rest<br />

vs 4.01 + 0.89 cm post exercise, ~


-~-- -___- -<br />

Subjects Sixteen soldiers, 18-39 years of age, volunteered for the study.<br />

Subjects were not from nor had they lived during the three months prior<br />

to the study at altitudes greater than 1500 m. All subjects were<br />

experienced marksmen prior to study participation.<br />

Eouipment<br />

Marksmanship performance was quantified with a Noptel ST-1000 (Oulu,<br />

Finland). laser marksmanship system. The system consists of a laser<br />

transmitter attached to a de-militarized M-16 rifle, a laser switch, an<br />

optical target, a personal computer, printer, and software provided by<br />

, Noptel.<br />

TAELE 1. TESTING SCHEDULE FOR WARKWSHIP kdEF&uRES.<br />

DAYS 1-5 SEA LEVEL<br />

Days I-4 Marksmanship Training<br />

Day 5 Marksmanship Assessment<br />

DAYS 6-23 4300 M ALTITUDE<br />

Day 6 Marksmanship Assessment, Acute Altitude Exposure, Fatigued State<br />

Days 'l-9 Marksmanship Assessment, Acute Altitude Exposure, Rested State<br />

Days lo-19 No <strong>Testing</strong><br />

Days 20-22 Marksmanship Assessment, Extended Altitude Exposure, Rested State<br />

Day 23 Marksmanship Assessment, Extended Altitude Exposure, Fatigued State<br />

Procedure<br />

The schedule of testing is shown in Table 1. On Day 6, subjects<br />

ascended (2500 m vertical ascent) 21 kmto the summit of Pikes Peak (4300<br />

m) as quickly as possible. Within 5 minutes upon completion of the<br />

ascent marksmanship was assessed. Subjects then resided for 16 days at<br />

the summit. On Day 23, subjects were returned to the base of Pikes Peak<br />

for a second ascent and subsequent marksmanship assessment. Each<br />

marksmanship test consisted of a total of 20 shots. Subjects were<br />

instructed to shoot at will for the first ten shots to obtain the best<br />

accuracy score possible. 'For the second ten shots, subjects were<br />

instructed to shoot as fast as possible without sacrificing accuracy<br />

(speed and accuracy). During the latter assessment, subjects were<br />

required to hold the barrel of the rifle below their waist. Following<br />

a verbal ready signal and a l-10 set randomly-varied preparatory<br />

interval, subjects were signalled to shoot upon illumination of a red<br />

stimulus light. Subjects shot in the free standing unsupported position<br />

from's distance of 5 m at a 2.3 cm diameter circular target.<br />

RESULTS<br />

A significant effect of altitude condition was observed for distance<br />

409<br />

.


__ _________..-.-...<br />

._ - -___. - ----..<br />

from center of mass (DCM) Q5.03). Post-hoc t-test analysis revealed<br />

that DCM for the accuracy-only test was greater @


The effects of both altitude exposure and fatigue on the various<br />

marksmanship parameters are summarized in Table 2. When shooting<br />

exclusively for accuracy, significant differences assessed via ANOVA<br />

existed for DCM &.Ol) and shot group tightness (SGT) &.02). Acute<br />

altitude exposure elicited a greater DCM and a more dispersed shot group<br />

than at sea level or after extended altitude exposure. When shooting for<br />

both speed and accuracy, DCM (e


_^,--_ ---.<br />

,<br />

at altitude, pi;;;: shooters als;ffired more quickly but less accurately.<br />

He suggests feelings sickness and increased physical<br />

symptomatology (acute mountain sickness) experienced in the first few<br />

days of altitude exposure lead to lowered motivation to perform well,<br />

presumably because of one's preoccupation with bodily discomfort. It is<br />

also possible that subjects become impatient trying to maintain a good<br />

aiming point with increased body sway encountered at altitude (Fraser,<br />

Eastman, Paul and Porlier, 1987). They may then shoot prematurely,<br />

resulting in the decrease in sighting time. It is speculated that<br />

subjects may feel that taking additional sighting time would not improve<br />

their accuracy. Another possibility may be that subject's time<br />

estimation is affected. Time may seem to pass more quickly than it<br />

actually does.<br />

Upon acclimatization to altitude, individuals took 8% longer (Acute<br />

Altitude Exposure 5.61 set vs Extended Altitude Exposure 6.06 set [means<br />

of rested and fatigue conditions combined]) to sight the target. The<br />

extra time apparently enables increased accuracy of shooting. Increased<br />

respiratory rate is among the physiological adaptations that occur with<br />

acute exposure to altitude, the faster the respiratory rate the more<br />

breaths that are missed during the breath-holding phase of aiming and<br />

pulling the trigger. This may increase discomfort associated with<br />

breath-holding and thereby decrease sighting time.<br />

While shooting at altitude, DCM, a measure of accuracy was 11%<br />

greater after exercise (4.01 cm) than for the rested condition (3.63 cm)<br />

[means of acute and extended altitude exposures combined]. Sighting time<br />

was not affected by fatiguing exercise. In contrast to the present<br />

results, Evans (1966) found accuracy was not affected by fatigue but<br />

firing latency was. Other previous findings proposed increased body sway<br />

after exercise as an explanation for reduced shooting accuracy of<br />

soldiers after a forced march (Knapik, Bahrke, Staab, Reynolds, Vogel and<br />

O'Connor, 1990), and biathletes after cross country skiing (Niinimaa and<br />

McAvoy, 1983). Increases in heart rate resulting from intense aerobic<br />

exercise also may impair shooting proficiency. Heart rate control by<br />

beta-blockers (Kruse, Ladefoged, Nielsen, Paulev, and Sorenson, 1986;<br />

Siitonen, Sonck and Janne, 1977) or biofeedback techniques (Daniels and<br />

Hatfield, 1981) are possible remedies.<br />

If military forces are to be prepared for deployment in a high<br />

terrestrial environment, it may be advantageous to have them training .<br />

routinely at high altitude. These results showed marksmanship accuracy<br />

returned to normal after two weeks residence at altitude. For events<br />

such as the biathlon and shooting competitions, athletes may benefit from<br />

both acclimation to altitude prior to competition and routine training<br />

at altitude.<br />

Daniels, F.S. & Hatfield, B. (1981). Biofeedback. Motor Skills: Theorv<br />

Into Practice 2, 69-72.<br />

Dusek, E.R. & Hansen, J.E. (1969). Biomedical study of military<br />

performance at high terrestrial elevation. Militarv Medicine, 134, 1497-<br />

1507.<br />

Evans, W.O. (1966). Performance on a skilled task after physical work<br />

or in a high altitude environment. Perceptual and Motor Skills, 2, 371-<br />

380.


___ .- _- .~- __--_.--~~ .--. --.- _ --.-- _<br />

Fraser, W.D., Eastman, D.E., Paul, M.A., C Porlier, J.A.G. (1987).<br />

Decrement in postural control during mild hypobaric hypoxia. Aviation,<br />

Space and Environmental Medicine, 58, 768-772.<br />

Fulco, C.S. & Cymerman, A. (1988). Human performance and acute hypoxia.<br />

In Human Performance Phvsiolocy and Environmental Medicine at Terrestrial<br />

Extremes. KB Pandolf, MN Sawka, and RR Gonzalez (editors) Benchmark<br />

Press, INC. Indianapolis, IN, pp. 467-495.<br />

Knapik, J., Bahrke, M., Staab, J., Reynolds, K., Vogel, J., C O'Connor<br />

J. (1990). Frequency of loaded road march training and performance on a<br />

loaded road march. United States Army Research Institute of<br />

Environmental Medicine Technical Report. T13-90, pp. 18-25.<br />

Kruse, P., Ladefoged, J., Nielsen, U., Paulev, P.E., C Sorenson, J.P.<br />

(1986). Beta-blockade used in precision sports: effect on pistol shooting<br />

performance. Journal of Anplied Phvsiolocv, 61, 417-420.<br />

Marlowe, B., Tharion, W., Harman, E., & Rauch, T. (1989). New<br />

computerized method for evaluating marksmanship from Weaponeer<br />

printouts. United States Army Research Institute of Environmental<br />

Medicine Technical Report. T30-90.<br />

Niinimaa, V. & McAvoy, T. (1983). Influence of exercise on body sway<br />

in the standing rifle position. Canadian Journal of Applied SDort<br />

Science, 8, 30-33.<br />

Siitonen, L., Sonck, T., c Janne, J. (1977). Effect of beta-blockade on<br />

performance: use of beta-blockade in bowling and shooting competitions.<br />

Journal of <strong>International</strong> Medical Research, 2, 359-366.<br />

413


__-----,_ __---__.<br />

HUMAN PERFORMANCE DATA FOR COMBAT MODELS<br />

COLLINS, Dennis D., Department of the Army, The Pentagon,<br />

Washington, D. C.<br />

The conceptualization of any modern system requires early<br />

integration with its operational environment. The requirement<br />

for early systems integration is particularly important for<br />

military systems which are unique in that they must function<br />

against an enemy intent on their destruction. Survival in this<br />

environment is frequently the principal mission of the system<br />

and also its principal measure of effectiveness. It is the<br />

analytical merger of the conceptual system with its operational<br />

environment which defines both the objective and importance of<br />

military combat modeling.<br />

Current versions of systems development models are virtually<br />

all computer resident. Because of the complexity of systems<br />

development, plus the requirement for many repetitions modern<br />

combat models are best suited for an automated environment.<br />

Combat models differ from Computer Aided Design/ Computer Aided<br />

Manufacturing (CAD/CAM). CAD/CAM is used to conceptualize and<br />

manufacture a specific system. A systems development combat<br />

model, on the other hand, is used to demonstrate a system's<br />

performance in its anticipated wartime environment performing<br />

against its probable enemy. Combat Models are also unique in<br />

that both the system and the wartime environment are required to<br />

be speculative in order to estimate the probable reality at the<br />

time the system will actually perform its battlefield mission.<br />

Modern data systems provide the capability to view systems<br />

,operational performance early in design, allowing elimination of<br />

candidate concepts even before they leave the drawing board.<br />

This relatively new capability to observe ltdraftVt or ltnotionalt'<br />

systems inside a model of an operational environment presents<br />

not only new powers of design, but new problems as well. The<br />

process of systems development from design through testing now<br />

takes place inside a computer. Entire technology options and<br />

systems design concepts can be eliminated long before even<br />

drawings are completed. Traditional human factors engineering<br />

begins when the concept of a system is sufficiently firm to<br />

permit the design of at least a mock up of the man-machine<br />

interface such as a cockpit simulator. The combat model, however,<br />

has allowed the selection of first order military technologies<br />

and systems candidates completely inside the notional<br />

reality of a computer.<br />

Because the systems development combat model grew from<br />

analytical communities which were oriented to tactics and<br />

engineering, the representation of human performance parameters<br />

in the evolution of combat models was rarely considered. The<br />

impact of this evolution has been subtle. By omitting human<br />

414


factors from both enemy and friendly forces, the engineering<br />

modeler intended to deal with the amorphous area of human<br />

factors through a balanced omission: Since neither side showed<br />

human factors, the effect was balanced and should have had no<br />

effect on the tactical or engineering conclusions drawn from the<br />

model's output. In early tactical wargames and engineering<br />

models, this approach was reasonable because the computers of<br />

the day were functional only in aggregated, t‘low resolutiont8<br />

modeling. Low resolution models provided valuable tactical<br />

insights, but little information about specific systems.<br />

Engineering models were also simple: one tank fired at another<br />

in a straightforward duel format.<br />

As wargames became automated, the ability to conduct<br />

high-resolution simulation allowed the tactical and engineering<br />

'modeling of actual systems in dynamic combat. Automation of<br />

wargames also made omission of human factors both unnecessary<br />

and problematic. "Balanced omission" of human factors in<br />

systems development combat models is more accurately described<br />

as the actual modelinq of the human as 100% effective. By<br />

failing to properly consider the human component of systems<br />

performance, the human has an assumed value of 100% effectiveness.<br />

It is generally accepted, even among combat modelers, that<br />

this assumption has the effect of exaggerating systems performance,<br />

and accelerating the tactical pace of a battle.<br />

The two original clients of the combat model, wargarners and<br />

hardware engineers, have had an understandable lack of interest<br />

in representing the human factor component of systems performance<br />

as anything other than 1.0. Human performance parameters<br />

are still much less defined than hardware performance parameters,<br />

and no clear consensus has emerged as to how human factors<br />

should be modeled. The case for improving the representation of<br />

human factors in systems development combat models focuses on<br />

the impact of modeling humans as 100% effective. Notional<br />

systems over-perform and technologies and systems candidates are<br />

eliminated in an occult process long before their interaction<br />

with the human dimension can be measured.<br />

There are additional dimensions to the dilemma of human<br />

factors in combat models. Combat model proponents have a<br />

somewhat justified view of their critics as romantics who wax<br />

philosophically about the value of such human traits as leadership,<br />

morale and courage on the the battlefield, but cannot<br />

quantify these dimensions in order that they be shown as "independent<br />

variables" in the outcome of analytical combat.<br />

A proposed approach for change is outlined in figure 1. A<br />

first step would be to identify the combat models most often<br />

used in the design and selection of systems. While this step<br />

may appear obvious, there could be a drift toward models which<br />

have little impact on systems development, but are easily<br />

modified for human dimensions. systems development models are<br />

415


usually sophisticated engineering development models which do<br />

not lend themselves to human dimension integration. Subsequent<br />

steps, in turn, would be:<br />

-Select those systems for study which require "man-in-the-loopI'<br />

for optimal functioning. Good candidates for study are those<br />

systems which depend upon humans for the performance of critical<br />

functions. The intent, early in a human dimensions integration<br />

program, is to pick those systems for study which are likely to<br />

show the importance of human dimensions, even when only limited<br />

human performance is modeled.<br />

-Select human systems tasks which are currently modeled by<br />

implication (i.e. man as 1.0) and for which data can be obtained,<br />

such as "acquire target'*,"identify target", or "lock-on<br />

target and fire". When systems are conceived, their designers<br />

allocate some tasks to man, some to the machine and some to both<br />

man and machine. A combat aircraft, for example, might acquire<br />

a target automatically through the system itself, depend on its<br />

operator for correct identification and attack decision, then<br />

return control to the system for attack launch and execution. In<br />

some highly sophisticated design processes using elaborate task<br />

analysis this process is formal. More often it is informal.,<br />

Selection of human-critical tasks will, like the first step,<br />

increase the likelihood that human variance will have an<br />

independent-variable impact on model outcome.<br />

-Modify the selected model to allow replication of the discrete<br />

human functions selected. Actual model algorithms need not be<br />

complex. The initial modifications need only demonstrate that<br />

the human tasks selected do, in fact, influence the outcome of<br />

the analysis as shown by the measures of effectiveness. Modifying<br />

complex models to show the more discrete human functions<br />

such as suppressed action due to fear or diminished target<br />

acquisition due to cognitive overload is within our current<br />

capability. Some models already represent these functions to<br />

a degree.<br />

-Run the model with the human factors modifications using the<br />

best available data.<br />

-Compare the model output (systems exchange ratios, force<br />

exchange ratios, etc.) between the basic combat model and the<br />

human factors modification. At this point human performance<br />

can be observed in a quantified fashion which is both understandable<br />

and acceptable to the senior engineering design<br />

community.<br />

-Demonstrate the value of human.factors algorithms in combat<br />

modeling through the (hopefully) significant differences between<br />

the basic and human factors modified model.<br />

416


Using those systems tasks which are frequently assigned to<br />

humans in systems design (identify friend-or-foe, for example),<br />

develop a plan for the collection of human performance task<br />

data:<br />

First, -search for existing data with high human factors and<br />

engineering community acceptance. In other words, use what we<br />

have first. This approach is particularly important early in<br />

the effort when the needs for combat modeling data are ill<br />

defined. Data collected without a good understanding of how it<br />

will be used is likely to go unused. As the process matures,<br />

the personnel data development and modeling communities will<br />

develop an understanding of one another's needs and a protocol<br />

for data communication will evolve.<br />

Second, develop data through the use of cost effective means<br />

such as developmental tests in training simulators. Since<br />

personnel data formats for combat models are likely to evolve,<br />

the costly process of test or field developed data is likely to<br />

be wasted due to inevitable changes. The new family of flight<br />

and vehicle simulators offers an excellent opportunity to<br />

collect human performance data for combat model input.<br />

Third, develop data through field operations research. An<br />

excellent example of this concept was the Fire Fighting Task<br />

Force study sponsored by the U.S. Army's Concepts Analysis<br />

Agency in Bethesda, Maryland. The Fire Fighting Task Force<br />

studied the psychological impact of stress caused by U. S. Army<br />

Infantry units fighting the Yellowstone National Forest fire in<br />

1988. This type of effort not only generates data for use in<br />

modeling, but contributes to our understanding of combat theory.<br />

Finally,loop early data development back to the human factors<br />

modified model in order to demonstrate human factors as an<br />

independent variable in the outcome of combat and document those<br />

human variables which warrant further developmental research.<br />

This loop-back function will automatically develop personnel<br />

combat modeling data protocol as a by-product.<br />

417


i<br />

_.___ - . . . ..~. -<br />

****************************t************************************<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

ID MODELS USED ‘.<br />

IN SYSTEMS DESIGN/<br />

SYSTEMS DEVELOPMENT<br />

REFINE SELECTION FOR*’<br />

“MAN-IN-THE-LOOP”<br />

SELECT TASKS 3*<br />

CURREN;~Y,~ODELED<br />

MODIFY SELECTED<br />

MODELS<br />

4.<br />

I RUN SELECTED<br />

MODELS AS MODIFIED I<br />

4<br />

DEVELOP DATA 2.<br />

THRU<br />

SIMULATION<br />

DEVELOP DATA 3.<br />

THRU<br />

FIELD COLLECTION I<br />

**t*************************************************************<br />

Figure 1<br />

A Paradigm for the Integration of Human<br />

Factors in Combat Models<br />

418<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

*<br />

-


'L'KADLNC; OE'k' PEKFOKMANCE, TKAlNLNG, AND EQULPMENT<br />

FACTOHS 'I'0 ACHLfiVE SIMLI,AK PEKFOKMANCE<br />

Janet J. Turnaqe, university of Central Florida<br />

Robert S. Kennedy, Essex Corporation<br />

Marshall 8. Jones, Pennsylvania State University<br />

INTRODUCTlON<br />

<strong>Military</strong> systems performance is generally the joint outcome of the human<br />

interacting with the machine. It is convenient to think oE this outcome in<br />

terms of causal models, where the elements that determine or drive systems<br />

performance may be relegated to equipment characteristics, training variables, .<br />

and individual capabilities. We suggest that an appropriate starting place<br />

for such causal analyses is to select a specific level of desired operational<br />

or systems perEormance beforehand. We call this device level Isoperformance.<br />

Then one can employ the different potential predictors of this outcome in an<br />

Isoperformance model. The way the model works is to select each variable as a<br />

potential predictor and then submit it to a trade-off methodology whereby each<br />

variable is compared in light of total operational proficiency desired.<br />

Examples of operational proficiency might be: (1) escape from an aircraft<br />

water crash within 60 seconds, (2) completion of a forced march carrying a<br />

36)-pound pack for 20 miles within 6 hours, (3) an 80% carrier landing board<br />

rate, or (4) control of 20 aircraft in the same airspace simultaneously.<br />

There are three meanings of the term “Isoperformance.” The first is a<br />

conceptual approach to human factoring. Second, the term may describe a<br />

curve, plotted against training time on the abscissa and aptitude on the<br />

ordinate. Third, Isoperformance is a specific interactive computer program.<br />

In this paper, we shall describe each of these features, in turn, and will<br />

present an illustration of one type of application.<br />

But first, let us more Eirmly specify the premise of Isoperformance. The<br />

premise is that the same (Iso) total systems efficiency (performance) is a<br />

function oE trade-ofEs between personnel, training, and equipment. To achieve<br />

this state of affairs, the Isoperformance model is intended to:<br />

(1) Make estimates of training outcomes for different categories of<br />

personnel;<br />

(2) Check internal consistency of estimates:<br />

(3) Compare estimates to known relations from human engineering,<br />

personnel, and training:<br />

(4) Counsel how to change “wrong” estimates:<br />

(5) Output Isoperformance curves; and<br />

(6) Leave a hard- copy audit trai 1.<br />

Isoperformance as a Conceptual Approach to Human Factoring<br />

Isoperformance was inspired by a long history of involvement in human<br />

engineer inq research, development , test, and evaluation. For example,<br />

military speci.fications and standards Eorm the basis for numerous systems<br />

419


-. “-----<br />

requirements, but their number and complexity often makes trade off decisions<br />

difficult because there i.s no context for their Cost. The literature has not<br />

helped either. A U.S. Air Force review of 114 human factors studies from<br />

1958-72 Eound that the physical characteristics of the stimulus were most<br />

often the signif icant factors in performance outcomes and there were few<br />

interactions.<br />

But these studies tended to ignore the contribution of practice<br />

or individual differences. This general finding suggested the multivariate<br />

(holistic) approach that was subsequently employed by the Navy in over a<br />

decade of simulator research, ranging from carrier landing, to air-to-ground<br />

combat , to Vertical Take-Off and Landing (VTOL) studies. The strong inference<br />

suggested by the results of these later studies was that people accounted for<br />

the most variance in performance, followed by training manipulations, and then<br />

by equipment variations.<br />

In this work, what was surprising was the modest amount of performance<br />

variance that could be accounted for by equipment features. In partitioning<br />

the performance variances over numerous experiments, equipment accounted For<br />

15-20%) trials of practice accounted for lo-25%. and people accounted for<br />

50-80%, with error usually in the 25-50% range. Again, interactions were few<br />

and far between. This implied that these main effects could be traded off if<br />

you started with them in the first place! Thus was born Isoperformance. That<br />

is, because some pilots are simply better than others, because repeated hops<br />

are costly, and because costly changes in equipment features may produce<br />

minimal changes in performance, we should concentrate on trade-offs among<br />

these known relations to bring about desired goals rather than only one or<br />

another mechanism. After the relative contributions are determined, a price<br />

tag can be placed on all dimensions and the cheapest solution sought.<br />

Isoperformance is therefore designed to accomp.lish Personnel, Training and<br />

Equipment trade-offs. In the Isoperformance complete program, the term<br />

Personnel can represent such features as sensory capabilities, cognitive and<br />

information processing abilities, anthropometry, or test scores, such as the<br />

Armed Services Vocational Aptitude Battery (ASVAB) scores, (presently the<br />

default condition). The term Training can represent such features as<br />

practice, sequence and series effects, learning, training regima and<br />

schedules, or number of sessions. The default condition is trial of practice.<br />

Similarly, the term Equipment can represent new vs. old, smart vs. dumb.<br />

hi-fidelity vs. low-fidelity, or any general A vs. B conEiguration, which is<br />

the default condition. The data sources for estimates of the scale values for<br />

these variables can come from various origins, including lay opinion, the<br />

scientific literature, explicit experiments, or technical data bases.<br />

Isoperformance Curves<br />

Figure 1 presents an illustration of IsoperEormance using two categories<br />

and one equipment feature (the current technoloqy).<br />

The Personnel category is divided into high- and low- to medium- aptitude<br />

groups, the Training time allotted is 9 weeks, and the proport ion of people<br />

that is desired to complete the training successfully is set at 50%. One can<br />

see that it takes the high-aptitude group only 4 weeks to achieve the SM:~<br />

proficiency level that it takes the low- and mediurr-aptitude group to achieve<br />

in 8 weeks.


PPOPORTI ON OF PEOPLE<br />

04 .T.# -<br />

90<br />

65<br />

50<br />

35<br />

20<br />

5<br />

TRAINING TIYE (in Weeks)<br />

LOW-u!D<br />

Figure 1. Illustration of Isoperformance Using IWo<br />

Categories and One Equipment Feature (The Current Technology)<br />

~- __ ----_<br />

Fiqure 2 shows that, if a second new equipment or technology feature is<br />

introduced (which reduces the training time to reach proficiency), then the<br />

time it takes for the groups to achieve the 50% criterion of proficiency<br />

reduces to 3 weeks and 6 weeks, respectively.<br />

PROPORTION OF PEOPLE<br />

I I I I I I I I<br />

60<br />

65<br />

50<br />

I v I I I I J<br />

2 3 4 5 6 7 6 0<br />

TRAINING TIME (in w*eks)<br />

nt APTITUDE<br />

LOW-UID<br />

Figure 2. If a Second Equipment Feature is<br />

Introduced (Which Reduce the Training Time to Reach Proficiency)<br />

Figure 3 illustrates the relation between category and equipment<br />

differences in terms of Isoperformance curves, where any point on the curve<br />

identifies the same (Iso) performance. Note that the Equipment difference is<br />

smaller than the “Personnel” difference.<br />

EQIJIPXLNT<br />

DIFFERENCES<br />

I CATEGORY<br />

E<br />

DIFFERENCE<br />

N<br />

e<br />

NEW EQUIPMENT OR TEC HNOLOGY<br />

CI y I 7 2 3 4 5 6 7 6 9’<br />

TRAINING 71hiE (in Weakr)<br />

Figure 3. lsoperformance Curves Helatinq Equipment<br />

Differences to Aptitude Differences<br />

421


---_l_ ~..__. --. -- .- -<br />

From these types OE ‘Looper Eormance curves, one can determine feasible<br />

combinations of personnel, training, and equipment. features for any specified<br />

level of des’ired performance. In addition, one can rule 011 t various<br />

combinations if there are constraints, Eor example, on personnel availability<br />

or training time.<br />

Isoperformance as a Computerized Program*<br />

An interactive, expert decision aid has been developed to quantify the<br />

trade-off methodology implicit in the Isoperformance approach. The<br />

computer-based “smart m system is intended to aid in decisionmaking by<br />

mechanizing the trade-offs between human (aptitude, training > and equipment<br />

variations in order to achieve the same (Iso) system performance outcome. The<br />

Isoperformance core subprogram is composed of four phases, Specification, -<br />

Input , Verification, and Output.<br />

Specification, the first phase of the Isoperformance program, requires the<br />

user to state the problem, in effect, by specifying:<br />

(1) the system under study,<br />

(2) what is meant by “proficient performance”,<br />

(3) the aptitude dimension to be used,<br />

(4) how that dimension, is to be divided into ranges or “personnel”<br />

categories, and<br />

(5) the maximum amount of training to be considered.<br />

These specifications are purely descriptive and no relationships have to be<br />

estimated.<br />

Input, the second phase, asks that, for each personnel category, the user<br />

estimate:<br />

(1) the minimum training time necessary Eor people in that category to<br />

become proficient in terms of number of weeks and<br />

(2) the proportion of persons in the category who are expected to become<br />

proficient given the maximum training time.<br />

The program works by receiving input from the user 2-3 estimates as needed<br />

for each aptitude category. The estimates can come Erom any reliable source<br />

(e.g., simulators, extrapolations from related tasks, etc.>. Estimations are<br />

planned for because it is expected that the data required are not readily<br />

available at the present time. However, the input can also be data from<br />

technical data banks iE available.<br />

In the third stage, Verification, the Isoperformance program checks to<br />

make sure that input estimates are “reasonable.” These checks are conducted<br />

whether the input data are “estimates” or actual data Erom a data base or<br />

experiment. There are three types of checks on user estimates: (a) formal,<br />

*Copies oE a demonstration disc are available Erom R. S. Kennedy, Essex<br />

Corporat ion, 1040 Woodcock Road, #227, Orlando, FL 32803


which is a check oE logical necessity, (b) qeneral. which compares est i.mates<br />

with known reqularities, and (c) specific, which compares user input with<br />

library validities. In general, an implicit correlation between aptitude and<br />

the performance dimension on which “proficiency” is defined can be calculated<br />

at every level of training. Also, the implicit correlations should decrease<br />

with training, and the IsoperEormance curves shou Id be decreasing and<br />

negatively accelerated. The results of these checks are repor ted and<br />

explained to the user, together with suggestions as to how the estimate might<br />

be modified to coincide with known regularities and ranges. The fourth phase,<br />

output, is simply the computer output Erom the preceding phases.<br />

AN APPLICATION<br />

The Isoperformance methodology can be applied to numerous human f’actors<br />

areas. Here, we will use as an exemplar freedom Erom simulator sickness in<br />

ground-based flight trainers. Motion sickness is a common problem in the<br />

military, particularly in testing and simulation devices. Virtually everyone<br />

with intact organs of equilibrium is susceptible to one form or another, but<br />

some people get sick all the time while others are virtually immune. However,<br />

we know that practice usually results in adaptation to motion sickness, and<br />

some specific equipment configurations are more conducive to adaptation than<br />

others (e.g., .2Hz).<br />

An example of the approach for applying IsoperEormance to simulator<br />

sickness is as follows:<br />

(1) Obtain a large data base with simulator sickness incidence,<br />

(2) Determine the relationship for each variable,<br />

(3) Isolate variables which are causal,<br />

(4) Select acceptable Isoperformance levels,<br />

(5) Calculate Isoperformance curves using two continuous causal variables<br />

as X/Y and one dichotomous causal variables as comparison,<br />

Afterwards, it is possible to put cost values on the outcomes and determine<br />

trade-offs Erom which decisions can be made.<br />

Therefore, we took from our large data base (N > 1000) of simulator<br />

sickness a series of correlational relationships. We cast them into a<br />

multiple regression equation and obtained the beta weights for such continuous<br />

variables as length of hop, whether visuals are on/off, field of view, usual<br />

state of fitness, etc. using the continuous variables, plus the dichotomous<br />

fit/unfit dimensions, we created Figure 4. Note that a four and one-half hour<br />

hop using a 305-degree field of view for a pilot who was fit would have the<br />

same simulator sickness score (110) as a pilot who had been ill and flew a<br />

two-hour hop with a 195 degree field-of-view.<br />

423


Esthdd Hop laqttl<br />

I Visuals On<br />

5.5 T<br />

Fitness usual<br />

Enough =I-l-m+...1<br />

ccn c,<br />

l--<br />

0.5--<br />

0<br />

I I I I I I<br />

Jo J&n 254 193 1W 83 dl<br />

field of View (Dcgrres)<br />

-Honotboulm “‘Ha hII<br />

Figure 4. Isoperformance Curves Comparing Simulator Field<br />

of View, Hop Length, and Pilots’ Report of Recent Illness<br />

CONCLUSIONS<br />

In Navy flight simulator studies, half of the variance appears to be<br />

attributable to Personnel differences, with Training and Equipment dividing<br />

the rest. Therefore, it is more inFormative to know who is flying than what<br />

trial of practice or on what equipment they are flying. The case EOr<br />

simulator sickness is similar. It appears that a considerable amount of<br />

variance in motion sickness research is attributable to Personnel differences<br />

with smaller proportions attributable to Equipment and Practice.<br />

In general, we believe that lsoperformance goals have merit because they<br />

estimate train-ing outcomes by:<br />

(1) Forcing the user to make estimates of training outcomes for different<br />

personnel categories.<br />

(2) Providing checks on the internal consistency and logical coherence of<br />

these estimates.<br />

(3) Providing checks on how we31 the estimates conform to known<br />

regularities from human engineering, personnel, and training research.<br />

(4) Informing the user as to the results of these checks, together with<br />

information about what can be done to make estimates consistent or<br />

bring them into closer conformity with known regularities and facts.<br />

(5) Leaving a hard-copy audit trail of all estimates, feedback, and<br />

outputted Isoperformance curves.<br />

Implementation of this model can help human engineering practitioners,<br />

training systems designers or human resource managers compare the relative<br />

Costs of differing combinations that lead to the same performance level. This<br />

trade-OCf technology is especially relevant<br />

in military budgets.<br />

today given pro;ected constraints<br />

424


FINAL REPORT, COMPUTER ASSISTED GUIDANCE INFORMATION SYSTEMS<br />

BAYES,Andrew H. Defense Activity for Non-Traditional Education<br />

Support, Pensacola, Fl 32509-7400<br />

INTRODUCTION<br />

In June 1989 DANTES released a final report covering the pilot study of four .,<br />

computer based guidance information delivery systems. In this report, a major<br />

recommendation was that the pilot study be expanded and additional data be<br />

gathered.<br />

The pilot study was expanded to a total of 102 sites in all active duty<br />

Services and to two Air Force Reserve sites. Regional training was conducted<br />

and only those sites that attended training were given the software.<br />

Each site was given User Surveys (Tab A) to be completed by each participant.<br />

The data from these surveys have been summarized in this report.<br />

SOFTWARE<br />

Based upon the results of the pilot study, DISCOVER by American College<br />

<strong>Testing</strong> and GIS by Houghton-Mifflin/Riverside were the software systems used<br />

in this expanded pilot study. 'They were chosen because they were the two<br />

highest rated by both the counselors and the clients. While they represent<br />

two different styles of counseling, they both contained the same basic<br />

modules and information. GIS provides more specific information on<br />

occupations and education. while DISCOVER uses the more traditional counseling<br />

approach.<br />

DATA COLLECTION<br />

The User Surveys requested demographic data as well as data reflecting the<br />

reactions of the clients to the software. It is interesting to note the data<br />

trends when pay grade or education is compared to reactions.<br />

STATEMENT OF PURPOSE<br />

* To determine if these systems were meeting expressed needs of the client<br />

population.<br />

* TO determine if these systems were a valuable addition to the resources<br />

available from the education centers<br />

* To determine what if any additional data bases would be valuable.<br />

* To determine if the systems were cost effective.<br />

* To determine if counselor time better utilized as a result of clients having<br />

used CAGIS.


DATA ANALYSIS AND INTERPRETATION<br />

While over 800 User Surveys were analyzed, the numbers do not remain constant<br />

because in some cases, the pay grade was not available, not all questions were<br />

answered by all respondents, or directions for completing the forms were not<br />

followed. It is felt, however that enough data was collected that the results<br />

are valid and do represent the cross section of clients visiting the education<br />

centers. It would be risky, however, to extrapolate from this data to the<br />

entire <strong>Military</strong> population. In a sense, the data here, represents only the<br />

reactions of individuals visiting the education centers and this may be a<br />

special sub-population. No attempt has been made to compare this population<br />

with the <strong>Military</strong> in general.<br />

P El<br />

A E2<br />

Y E3<br />

E4<br />

G E5<br />

R E6<br />

A E7<br />

D E8<br />

E E9<br />

TABLE I<br />

EDUCATIONAL LEVEL BY PAY GRADE<br />

EDUCATIONAL LEVEL<br />

1 2 3 4 5 6<br />

.Ol .60 .30 .Ol 0 0<br />

0 .77 .63 0 0 0<br />

.Ol .51 .29 .04 .04 .lO<br />

0 .47 .3a .06 .04 .02<br />

0 .32 .30 .lO .09 .08<br />

0 .33 .33 .16 .09 .03<br />

.04 .13 .33 .15 .ll .15<br />

0 .17 .22 .17 0 .33<br />

0 .19 0 .31 .25 .06<br />

_------ 7 8<br />

0 0<br />

0 0<br />

0 0<br />

C.01 C.01<br />

.02 .Ol<br />

.05 0<br />

0 .06<br />

0 .I1<br />

0 .19<br />

C<br />

--.-.A<br />

0<br />

0<br />

0<br />

G<br />

0<br />

0<br />

-02<br />

0<br />

0<br />

01 .25 0 .50 .12 .12 0<br />

02 . 78 .ll .ll 0<br />

03 . 62 .15 .23 0<br />

04 .09 .18 .72 0<br />

05 . 11 .22 .22 -44 0<br />

EDUCATIONAL LEVEL 1 = No diploma<br />

EDUCATIONAL LEVEL 2 = High School/GED<br />

EDUCATIONAL LEVEL 3 = l-2 Years of College<br />

EDUCATIONAL LEVEL 4 = AA/AS Degree<br />

EDUCATIONAL LEVEL 5 = 3-4 Years of college<br />

EDUCATIONAL LEVEL 6 = BA/BS Degree<br />

EDUCATIONAL LEVEL 7 = Some graduate study<br />

EDUCATIONAL LEVEL 8 = Masters degree<br />

EDUCATIONAL LEVEL 9 = Doctorate<br />

CAREER PLANS<br />

When asked to describe their career plans, the enlisted population at the<br />

El-E4 level indicated that they planned to leave after their current<br />

enlistment. E5s were almost evenly divided between remaining on active dut)<br />

426


___-. ----- ..~._ ~. ----~--..---.- .____<br />

until retirement and being uncertain about leaving after their present<br />

enlistment. E6-E8 indicated that they planed to stay until retirement. The<br />

following table indicates career plans by branch of Service and pay grade.<br />

427


TABLE II<br />

CAREERPLANS<br />

1 2 3 4 5<br />

P<br />

A El .08 .08 .25 .50 -08<br />

Y E2 . 11 .04 .21 .43 . 21<br />

E3 . 11 .51 . 16 .43 .24<br />

G E4 . 12 .05 * 10 .42 * 31<br />

R E5 .33 .05 . 18 . 30 . 13<br />

A E6 .70 .02 -08 .16 .03<br />

D E8 .72 . 11 .08 .05 .03<br />

E E9 .96 0 0 0 .03<br />

% % % % %<br />

ARMY .20 .05 .09 .38 .27<br />

AIR FORCE .41 .04 .12 .31 .ll<br />

NAVY .29 .04 -18 ‘29 .20<br />

MARINES .40 .03 .08 . 38 . 11<br />

CAREER PLANS:<br />

l=Probably stay until retirement<br />

2=Stay beyond present obligation but not to retirement<br />

3=Probably stay beyond present obligation but not until retirement<br />

4=Probably leave after present obligation<br />

5=Definitely leave after present obligation<br />

While the majority of clients learned about CAGIS when they visited thr:<br />

Education Center or attended a briefing by the Education Center staff, about<br />

l/3 learned about CAGIS from a, co-worker. This would indicate that the<br />

program was felt to be valuable enough to recommend it to a friend.<br />

Because part of the data collection was designed to determine the relati,Je<br />

effectiveness of the two systems, several comparisons were made. Eighty-cr,e<br />

percent of the sites received GIS but only 45% of the User Surveys were<br />

returned from GIS sites. Clients spent more time using Discover (29% 46 to 60<br />

minutes) as opposed to GIS users who spent less time with that system (34%-16<br />

to 30 minutes). Seventy-one percent of the clients spent between sixteen and<br />

sixty minutes using the software. It is interesting to note that E3s and E6s<br />

are spending the most time using the computer. This appears to be a critical<br />

time for them in their careers. Eighty-one percent of the GIS users felt they<br />

understood the system they used, while 91 percent of the Discover users felt<br />

they understood that system.<br />

EFFECTIVENESS OF SYSTEMS<br />

In an attempt to determine the effectiveness of the two systems, clients were<br />

asked to compare the computer systems with other types of reference materials.<br />

The first question asked the client to rate the CAGIS information in relatrcn<br />

to any other reference source. Eighty-two percent of the users rated CAGIS<br />

either superior or better than any other sources. Clients were then asker:<br />

about the currency of the information, and again 77% rated the inform#a%icn<br />

either superior or decidedly more current.


As a further measure of the effectiveness of the systems, the clients were<br />

asked if they talked with a counselor following their session on the computer,<br />

and if they did talk with a counselor did they feel better prepared. Forty<br />

percent of the users did not talk with a counselor after interacting with the<br />

software. Ninety-one percent of those that talked with a counselor stated<br />

that they were better prepared to talk with a counselor. Seventy-five percent<br />

of those that did not talk with a counselor felt that they did not need to<br />

talk with a counselor because the system had answered all their questions.<br />

These statements would indicate that the systems are maximizing the counselor<br />

resources by screening out those clients that were basically seeking only<br />

information. This frees the counselors to do counseling and relieves them of<br />

simple information giving.<br />

RANKING THE DATA BASES<br />

When asked to rank which data base they felt was most useful, the percentage<br />

of clients ranked the following data bases as number one:<br />

Civilian Careers .35<br />

Undergraduate Degrees .25<br />

Graduate Degrees .13<br />

<strong>Military</strong>/Civilian Crosswalk .ll<br />

Financial Aid -07<br />

<strong>Military</strong> Careers .06<br />

Resume .02<br />

OVERALL RATING OF CAGIS<br />

As a- feature in the Services provided by the education offices, users were<br />

asked to rate the CAGIS they used. For the two systems, the following ratings<br />

were .assigned:<br />

TABLE III<br />

RATING<br />

1 2 3 4 5<br />

GIS .56 .37 .06 c.01 c.01<br />

DISCOVER .62 .34 .04 c.01 x.01<br />

l=Essential 2=Important 3=Neutral I=Not important S=Not required<br />

CONCLUSIONS<br />

In reviewing the Statement of Purpose, the data supports each statement.<br />

The systems are meeting the needs of the clients as evidenced by comparing the<br />

responses of the clients as to which data bases they used and how they<br />

ultimately ranked those data bases. This is also shown by the responses to<br />

the completeness and currency of the information provided.<br />

429<br />

---_


Clearly these systems are felt to be valuable resources. Over 90% of the<br />

users felt the systems were either essential or very important additions to<br />

the education centers.<br />

cost effectiveness is difficult to determine, but when one notes the amount of<br />

time spent using the software and compares this to the hourly cost of a GS<br />

9/11 guidance counselor it is apparent that money is being saved. with<br />

increased quantities, the price of the leases becomes even less. Site<br />

licenses also provide more software at an even greater saving. The fact that<br />

family members and DOD civilians can also access the systems at no additionai<br />

cost further enhances the cost effectiveness.<br />

.<br />

Better utilization of the counselor time is apparent by the data indicating<br />

the number of users that did not need to meet with a counselor upon.completion<br />

of their use of the software. The number of clients indicating that they were<br />

better prepared to meet with a counselor also allows the counselor to provide<br />

assistance with things other than simple information giving. (e.g. information:<br />

integration)<br />

RECOMMENDATIONS<br />

1. DANTES seriously investigate the possibility of adding some of the<br />

requested additional dat,a bases. Working with the vendors to incorporate<br />

these data bases into the existing systems should be relatively easy. The<br />

vendors have been asked to identify SOC schools in their next editions.<br />

2. As the personnel resources of the education centers are being drawn down<br />

and more Service members are being released from the Service, these systems<br />

should be expanded to reduce the quantity of personal counseling. Education<br />

centers should consider increasing their investment in computer hardware in<br />

order to expand their counseling efforts. When on-site scoring becomes<br />

possible, sites will want the capability to take advantage of this enhancement.<br />

DANTE.9 plans to expand the program to approximately 250 sites, but only<br />

to those sites that are willing to participate in training and have the<br />

hardware available. Several Educational Services Officers stated that the<br />

addition of CAGIS was extremely valuable in augmenting their resources for<br />

Project Transition.<br />

3. Increasing the "user friendliness" should be a major objective of the<br />

vendors. While the information in the systems is valuable, difficulty in<br />

accessing the information diminishes the usefulness of the systems. The<br />

vendors need to be aware of this shortcomming and either provide additional<br />

training or provide more technical support to the education centers.<br />

4. Counselors need to overcome their reluctance to use the computers. Their<br />

resistance to become involved with computers is denying wider use of the<br />

systems. The counselors do not really know how much information each system<br />

has and inconsequence do not take full advantage of the breadth of information<br />

available. An effort should be made to work with the counselors during workshops<br />

and national conventions.<br />

430


5. Training is essential to the success of the program. It is recommended<br />

that someone from DANTES attend each of the training Sessions. This was not<br />

done for this portion of the pilot study and inconsequence, data collection<br />

was slow and many follow-up letters had to be written.<br />

6. The need for extensive and current data is apparent. Many outdated and/or<br />

hardbound references can be replaced by the CAGIS software. DANTES should<br />

consider distributing reference materials less frequently and rely more on the<br />

information available in the CAGIS software.<br />

SUMMARY<br />

The data from this expanded pilot study clearly substantiates the data from ..<br />

the initial pilot study. The systems have considerable value not only to the<br />

clients but also the education center personnel. It is cost effective, up-todate,<br />

thorough, and most importantly readily available. The users, ranging<br />

from active duty personnel to DOD civilians and family members indicated very<br />

strongly that this is an essential service.<br />

As the <strong>Military</strong> enters into a period of austere funding and personnel<br />

reductions, programs such as CAGIS will become increasingly important to help<br />

personnel make the transition back to the civilian workplace and higher<br />

education. Comments from program administrators clearly demonstrate the<br />

feeling that these programs are going to fill a very large gap in their<br />

services.<br />

Education centers need to move more rapidly into the world of automation and<br />

take advantage of the information explosion. These systems are going to make<br />

information retrieval instantaneous and eliminate hours of tedious research<br />

using hard cover reference materials.<br />

It would appear from the current data that these systems have value for all<br />

pay grades and all branches of Service. The program should be expanded to<br />

allow all sites that have a need to be able to access one of the systems.<br />

431


VERTICAL COHESIOk PATTERNS IN LIGHT INFANTRY UNITS'<br />

Cathie E. Alderks<br />

U.S. Army Research Institute for the<br />

Behavioral and Social Sciences<br />

Alexandria, VA<br />

Researchers have shown that strong cohesion among soldiers<br />

as well as cohesion within platoon level leadership teams has a<br />

consistent association with platoon performance and the ability<br />

to withstand stress (Siebold and Kelly, 1988a, 1988b). However,<br />

research pertaining to the impact of vertical cohesion up and<br />

down the chain of command on small unit performance is limited.<br />

In this paper the pattern of vertical cohesion from squad through<br />

company and its impact on performance at Army Combat Training<br />

Centers are examined.<br />

METHOD AND SAMPLE<br />

Data were collected by questionnaire from soldiers and<br />

leaders within five light infantry battalions (N = 60 platoons)<br />

at three points in time. The first point in time (Base) occurred<br />

4-6 months before the battalion was scheduled to go through a<br />

training rotation at either the U.S. Army National Training<br />

Center (NTC), Fort Irwin, CA, or the U.S. Army Joint Readiness<br />

Training Center (JRTC), Fort Chaffee, AR. The second point in<br />

time (Pre-rotation) was 2-4 weeks prior to the rotation; the<br />

third point (Post-rotation) occurred 2-4 weeks following the<br />

training rotation.<br />

Base and pre-rotation questionnaires were administered by<br />

researchers from the U.S. Army Research Institute to platoon<br />

level soldiers (squad members (SM), squad leaders (SL), platoon<br />

sergeants (PS), and platoon leaders (PL)) one company at a time<br />

in either a classroom or dayroom setting. Soldiers took<br />

approximately 30 minutes to complete the 160-item questionnaire<br />

after instructions. Soldiers responded on a machine readable<br />

answer sheet. Post-rotation questionnaires were given at the<br />

start of interviews in an office or dayroom setting to the<br />

following groups of soldiers within a company: 1) all PLs, 2) all<br />

PSS, 3) two-thirds of the SLs, and 4) all SMs from one intact<br />

squad in the company. Post-rotation questionnaires were short<br />

(21 items plus some unit and position identification questions}<br />

and took soldiers less than 10 minutes to complete: responses<br />

were made on the questionnaire itself.<br />

'The views expressed in this paper are those of the author<br />

and do not necessarily reflect the views of the U.S. Army<br />

Research Institute or the Department of the Army.<br />

432


Post Selection Board Analysis<br />

Post-selection board review of the 1986/87 NROTC scholarship<br />

year pointed to the need to build more structure into the<br />

evaluation system in order to (1) provide more consistency in<br />

the evaluation of records and (2) permit the selection of those<br />

who were truly best qualified in both an academic and potential<br />

officer sense.<br />

Assessment of the criteria used by board members to assign<br />

points to an application suggested that there was wide variance<br />

among board members in the value placed on the level of a<br />

student's academic or extracurricular performance and the type<br />

of student extracurricular activity. For example, some board<br />

members felt that athletic participation was essential for<br />

success as an officer; others did not. Applications were<br />

scored accordingly, with the resulting selection scores<br />

dependent upon the values of the particular selection board<br />

members assigned to review an application. This created the<br />

potential for wide variance in the scoring of similar<br />

applications by different selection boards.<br />

Analysis of the scores assigned by the weekly boards revealed<br />

that the average score awarded was over 80 points (out of<br />

100). This meant that weekly selection board members had very<br />

little ability to "reach down" to select an applicant who came<br />

to the selection process with a less competitive Quality Index,<br />

regardless of the merit of the applicant.<br />

Solution<br />

To address the problems of evaluation consistency and the<br />

extremely high average selection board score, a more formal<br />

method of application evaluation was instituted. Applicant<br />

evaluation categories were developed from observation of board<br />

member discussion during the initial weekly selection board<br />

sessions. Those areas that selection board members appeared to<br />

value consistently as most important when discriminating<br />

between competitive scholarship applicants were incorporated<br />

into a revised applicant evaluation system. Each evaluation<br />

category was also assigned a scoring level maximum . Optical<br />

Mark Reading (OMR) equipment was purchased and the NROTC<br />

Scholarship application was redesigned to be read by an optical<br />

scanner. Additionally, a formal selection board training<br />

program was developed to ensure that each weekly selection<br />

board began the selection board process with the same<br />

application evaluation guidance.<br />

This revised selection system was finalized during the summer<br />

of 1987 and used by the first weekly selection board of the<br />

1987/1988 NROTC program year. Each year, data based on<br />

selection board actions are reviewed and the system modified as<br />

483


The base and pre-rotation questionnaires contained items<br />

which formed scales measuring interpersonal, organizational, and<br />

leadership constructs (e.g., SM horizontal cohesion, job<br />

satisfaction, command climate, training effectiveness), as well<br />

as various demographic items. The post-rotation questionnaires<br />

focused mainly on soldier perceptions of performance during their<br />

recent rotation. In addition, for the two battalions which<br />

rotated through the JRTC, ratings on leader and platoon<br />

performance were provided just after the rotation by the<br />

observer/controllers (0~s) who observed each platoon during the<br />

rotation. In other words, the base and pre-rotation<br />

questionnaires contained the home station determinants<br />

(predictors) of performance; the post-questionnaires and the<br />

ratings from the OCs provided criterion measures for that<br />

performance.<br />

For the present paper, only Pre-rotation interpersonal<br />

scales and Post-rotation performance scores were considered.<br />

Platoon scores were obtained for each of the scales using a mean<br />

aggregate procedure. Standard scores were obtained to compare<br />

scores on the same scale.<br />

Vertical cohesion scales were used to examine the strength<br />

of each segment in each SM to Company Commander (CC) chain of<br />

command. These scales included 1) SM rating SL, 2) SM rating PS,<br />

3) SM rating PL, 4) SL rating PS, 5) SL rating PL, 6) PS rating<br />

PL, 7) PS rating CC, and 8) PL rating CC. It must be emphasized<br />

that in each case, a subordinate was rating a superior. These<br />

scales were composed of items such as "the leader treats us<br />

fairly", "the leader looks out for the welfare of his people",<br />

"the leader is friendly and approachable", "the leader pulls his<br />

share of the load in the field", and "the leader would have my<br />

confidence if we were in combat together". Scale item factor<br />

loadings (where N was sufficiently large to justify a factor<br />

analysis, i.e., sets of scale ratings by SMs and SLs) were .80-<br />

. 87, .79-.8G, .80-. 86 for SMs rating SLs, PS, and PL,<br />

respectively, and .65-.88 and .72-.90 for SLs rating PSs and PLs,<br />

respectively, with each scale forming independent factors.<br />

Performance scales were obtained from ratings of missions<br />

performed at JRTC/NTC. They were determined four ways: 1) oc<br />

ratings, 2) CC ratings, 3) Platoon ratings composed of the mean<br />

ratings of the PL, PS, SLs, SMs with each level receiving a<br />

weight of one, and 4) Overall ratings composed of the mean<br />

ratings of the OCs, CC: PL, Ps, SLs, SMs with each level<br />

receiving a weight of one.<br />

Two approaches were chosen to examine vertical cohesion in<br />

the chain of command. The first approach was to identify the<br />

lowest break. The rationale was that since the lower leaders<br />

oversee the squad members who accomplish the direct fighting<br />

tasks, lower breaks in the chain of command might have a greater<br />

impact on direct platoon performance than breaks that occurred<br />

433


--<br />

higher. The second approach was to count the total number of<br />

breaks that occurred anywhere in the chain of command. In both<br />

approaches, z-scores were computed to determine if and where<br />

breaks occurred. The decision rule for a break to occur required<br />

a z-score I -.5 on the scale measuring cohesion between one<br />

position in the chain of command and a higher position. Where<br />

two or more scores for rating a particular leader were available<br />

(e.g., SMs, SLs, and PSs each rating PL) only one of the scores<br />

.was required to meet the decision rule of z 5 -.5. By example, a<br />

platoon could have a z ( -.5 at the SM-SL level and also at the<br />

PL-CC level. It would be included in the SL lowest break group<br />

and not considered for further lowest break groups. However, in<br />

composing the number of break groups, this platoon would be<br />

counted as having two breaks.<br />

RESULTS AND DISCUSSION<br />

The lowest break in the chain of command could occur at any<br />

point. Table 1 shows where the lowest level break occurs and<br />

lists the number of platoons per battalion in each cn the<br />

categories.<br />

Table 1.<br />

Frequency at Which the : Frequency Distribution of Lowest<br />

Lowest Break Occurred : Break by Battalion<br />

.<br />

Level of Platoon . Battalion Level of Lowest Break<br />

Lowest Break Freq. % : SL PS PL CC NONE<br />

. ---<br />

:<br />

SL 20 33 : V 3 0 3 3 3<br />

.<br />

PS 16 27 :<br />

.<br />

W 4 5 2 0 1<br />

PL 10 17 :<br />

.<br />

X 3 2 1 1 5<br />

cc 4 7 :<br />

.<br />

Y 6 4 10 1<br />

NONE 10 17 : z 4 5 3 0 0<br />

Table 2 gives similar information for the analysis approach<br />

considering the total number of breaks within each platoon<br />

focused chain of command. As there were four levels within each<br />

chain, a range of zero to four breaks was possible.<br />

Correlations indicating the relationship between the lowest<br />

break and the number of breaks in the vertical cohesion chain of<br />

command with the performance scales are listed in Table 3.


Table 2.<br />

Frequency of Total Number : Frequency Distribution for the<br />

of Breaks per Platoon : Number of Breaks by Battalion<br />

. - - . _.-..-__.<br />

Total Number Platoon :<br />

of Breaks Frey. % : Battalion<br />

.<br />

*-<br />

.<br />

Number of Breaks<br />

0 1 2 3 4<br />

------- - _ _ .__ _,. _<br />

0 10 17 :<br />

.<br />

v 3 3 6 0 0<br />

1 17 29 : w 1 3 6 2 0<br />

2 21 34 : X 5 3 2 2 0<br />

:<br />

3 9 15 : Y 1 6 2 2 1<br />

:<br />

4 3 5 : z 0 2 5 3 2<br />

Table 3. Lowest Break and Number of Breaks Correlated with<br />

Performance Measures<br />

Type of<br />

Measure Platoon Performance Rated By:<br />

oc cc PLT<br />

--<br />

OVERALL<br />

LOWEST BREAK .37 . 03 .33** .zG"-k<br />

--_-. _<br />

---.<br />

NUMBER<br />

OF BREAKS -*Zig* -.34* -.37** -.44***<br />

~-_- __----_<br />

* p < .05 ** p < .Ol *** pc -001<br />

Figures 1 and 2 illustrate the relationship between the mean<br />

overall performance scores and the lowest break and number of<br />

breaks conditions, respectively. Analysis of variance provides<br />

an F of 3.74, p < . 01 and an F of 4.47, p < .004 for the data in<br />

Figures 1 and 2, respectively. Similar results were obtained by<br />

using any of the other methods of computing the performance<br />

measures.<br />

Examination of Figure 1 reveals that platoon performance is<br />

most degraded when either the PS or the PL is at the position of<br />

the lowest break. Performance is better than average when the<br />

lowest break in vertical cohesion occurs at the CC level and is<br />

best when the vertical chain has no breaks at all. Since<br />

performance measurement was at the platoon level, some cloudinrj<br />

435


Figure 1. Lowest Break in Vertical<br />

Cohesion by Mean Platoon Performance<br />

0.75<br />

0.5<br />

0.25<br />

0<br />

-0.25<br />

4.5<br />

A-<br />

I -,---L-i<br />

SL PS PL CC NONE<br />

.WTY POSITION OF LOWEST BREAK<br />

Figure.2. Number of Breaks in Vertical<br />

Cohesion by Mean Platoon Performance<br />

0.75<br />

0.5<br />

0.25<br />

iz 0<br />

4<br />

3 -0.25<br />

0<br />

t -0.5<br />

i -0.75<br />

Y -1<br />

.-e.. .._. / ,---.--_<br />

..-i .,-. _-,- --.<br />

& _______---_d<br />

4 3 2 1 0<br />

NUMBER OF BREAKS<br />

436


--<br />

of the results occurred at the SL level of breaks. Seldom would<br />

one find all SM-SL links within a platoon equally rated.<br />

Therefore, taking an average SM-SL cohesion rating for the three<br />

squads within a platoon moderated particularly strong or weak<br />

links. A break at the SM-SL level would meet the z 5 -.5<br />

criterion only if one or more of the SM-SL links were extremely<br />

weak. Nevertheless, good links in other squads could compensate<br />

and result in the platoon having acceptable performance. This,<br />

and other explanations are being studied.<br />

Examination of Figure 2 reveals additional findings.<br />

Generally, the fewer cohesion breaks there are, the better the<br />

performance with performance being best when there are no<br />

cohesion breaks at all. Performance is maintained at an average<br />

level with one or two breaks. Additional breaks in cohesion<br />

correspond to less than average performance.<br />

In summary, while a causal relationship can not be inferred,<br />

it appears that the strength of vertical cohesion as measured<br />

prior to engagement is a good predictor of platoon performance at<br />

a Combat Training Center. Vertical cohesion appears most<br />

important to platoon performance at the top platoon leadership<br />

levels, that of PS and PL. Where cohesion breaks at this level,<br />

performance tends to be less effective. However, when vertical<br />

cohesion is strong (that is, when subordinates see their<br />

superiors at taking care of them and being skilled), performance<br />

is strong. These findings are important because they<br />

quantitatively confirm "common lore"; they suggest the cohesive<br />

strength of a chain can be measured, and they indicate that the<br />

success of any efforts to increase or maintain the strength of<br />

vertical cohesion in a platoon focused chain of command can be<br />

assessed against a clear criterion measure.<br />

REFERENCES<br />

Siebold G.L. and Kelly, D.R. (1988a) The impact of cohesion on<br />

platoon performance at the Joint Readiness Training Center.<br />

Technical Report 812. Alexandria, VA: U.S. Army Research<br />

Institute for the Behavioral and Social Sciences. ADA<br />

202926.<br />

Siebold, G.L; and Kelly, D.R. (1988b) A measure of Cohesion which<br />

predicts unit performance and ability to withstand stress.<br />

Proceedinss: Sixth Users' Workshop on Combat Stress, San<br />

Antonio, TX, 30 Nov-4 Dee 1987. Consultation Report 88-003.<br />

Fort Sam Houston, TX: Health Care studies and Clinical<br />

Investigation Activity, Health Services Command.<br />

437


THE USE OF INCENTIVES IN LIGHT INFANTRY UNITS'<br />

Twila J. Lindsay and Guy L. Siebold<br />

U.S. Army Research Institute for the<br />

Behavioral and Social Sciences<br />

The research described in this paper is part of a larger<br />

project to examine the home station determinants of subsequent<br />

small unit performance at U.S. Army Combat Training Centers.<br />

This paper focuses on describing the patterns of utilization of<br />

standard incentives in units and the extent to which these<br />

patterns were associated with other organizational variables ,and<br />

small unit performance. The incentives examined were llPublic<br />

recognition for a job well done", "Passes", "Awards",<br />

I'Specialized training coursesIt, "Letters of appreciation or<br />

commendationl', and ttPromotions.lV<br />

METHOD AND SAMPLE<br />

Data were collected by questionnaire from soldiers within<br />

five light infantry battalions (N = 60 platoons) at three points<br />

in time. The first point in time (base) was 4-6 months before<br />

each battalion was scheduled to go through a training rotation at<br />

either the U.S. Army National Training Center (NTC), Fort Irwin,<br />

CA or the U.S. Army Joint Readiness Training Center (JRTC), Fort<br />

Chaffee, AR. The second point in time (pre-rotation) was 2-4<br />

weeks before the rotation; the third point (post-rotation) was<br />

about 2-4 weeks after the training rotation. There were two<br />

other sources of data : a) platoon mission performance ratings at<br />

JRTC by the platoon level observer/controllers (O/Cs) on 23 of<br />

the. platoons, and b) company commanders' ratings of the mission<br />

performance of their subordinate combat platoons at NTC/JRTC.<br />

Base and pre-rotation questionnaires were given typically to<br />

all soldiers (squad members through platoon leader) in one<br />

company at one time in a classroom or dayroom setting. The<br />

soldiers responded on machine-readable answer sheets. The<br />

questionnaires consisted of about 160 items and took the average<br />

soldier about 30 minutes to complete after instructions. Postrotation<br />

questionnaires were short (21 items plus some unit and<br />

position identification questions) and took soldiers less than 10<br />

minutes to complete; responses were made on the questionnaire<br />

itself. Post-rotation questionnaires were given at the start of<br />

group interviews to four separate groups of soldiers in a company<br />

(platoon leaders, platoon sergeants, squad leaders, and members<br />

of one intact squad). Post-rotation questionnaires, along with<br />

the subsequent group interviews, were usually given in an office<br />

or dayroom setting.<br />

The base and pre-rotation questionnaires contained items or!<br />

'The views expressed in this paper are those of the aUthOrS i-zne<br />

do not necessarily reflect the views of the L:,S. Arm'; Research<br />

Institute or the Department of the Army<br />

--. .- - . - __


incentive utilization, scales measuring important interpersonal<br />

and organizational constructs, and various demographic items.<br />

The post-rotation questionnaires focused on soldier perceptions<br />

(self ratings) of mission performance during their recent<br />

rotation. In other words, the base and pre-rotation<br />

questionnaires contained the home station determinants<br />

(predictors) of performance, including utilization of incentives;<br />

the post-rotation questionnaires (platoon self ratings) and<br />

ratings by the O/Cs and company commanders functioned as<br />

criterion measures of that performance.<br />

The analyses prepared for this paper focused on the<br />

responses from the squad members to the pre-rotation<br />

questionnaire which included a measure of incentive use. The<br />

soldiers assessed the utilization of each incentive; an<br />

aggregation of responses to the items was used to assess the<br />

total level of incentive utilization. The use of each incentive<br />

was assessed by a five point scale: 1 = seldom used, 2 = used<br />

occasionally, sometimes for the wrong people, 3 = used<br />

occasionally, given to the right people, 4 = used often,<br />

.sometimes given to the wrong people, 5 = used often, given to the<br />

right people. A two dimensional response scale was used due to<br />

shortage of questionnaire space.<br />

RESULTS<br />

The distribution of overall individual squad member<br />

responses assessing the utilization of each incentive is<br />

illustrated in Figure 1. The figure indicates that giving<br />

'lPasseslt was the incentive most frequently utilized and the<br />

incentive most often given to the right soldier. The least<br />

utilized incentive was "Letters of appreciation or commendation."<br />

The incentive seen as most often given to the wrong person was<br />

*8Promotions.n<br />

This incentive utilization pattern was similar across the<br />

five battalions and for most companies. Most variation in the<br />

utilization patterns was across platoons. This finding may<br />

indicate that there was an attitudinal component to the ratings<br />

which may have biased their accuracy. Nonetheless, the overall<br />

responses of the soldiers, as well as the platoon mean<br />

utilization levels shown in Table 1, suggest that, on the whole,<br />

incentives are not as frequently or effectively utilized as they<br />

might be'.<br />

A key focus of analysis in this research was to estimate the<br />

relationships between use of incentives, standard organizational<br />

variables, and platoon performance. The estimates of these<br />

relationships were needed to develop a working model of the<br />

interactions among the variables. Such a working model, in turn,<br />

was needed to develop a more thorough model for use in designing<br />

programs, tools, or interventions to enhance unit performance.<br />

In the analysis for this paper, the authors examined a set<br />

of standard organizational variables to find their relation to<br />

the use of incentives: 1) company learning climate, 2) job<br />

satisfaction, 3) platoon pride, 4) expectations that the NTC/JRT.C<br />

rotation would be valuable training, 5) motivation for the<br />

3 39


%5 (+<br />

40-<br />

‘-J<br />

3 o- Lt)<br />

2 o-<br />

1 3-<br />

‘)<br />

2 5<br />

17<br />

llll-n10<br />

1 2 3.4 5<br />

PUBLIC RECOGNITION<br />

1<br />

25<br />

2 3 -<br />

-<br />

17<br />

FIGURE 1. UTILIZATION OF INCENTIVES BY SCXJAD MEMBERS<br />

7<br />

%tj o-<br />

40-<br />

! O-<br />

30-<br />

20-<br />

1 o-<br />

x5 o<br />

4 0<br />

3 0<br />

20<br />

10<br />

0I<br />

27-<br />

1<br />

33-<br />

1<br />

1<br />

--<br />

KEY<br />

1 =seldom used<br />

2=occasionally used, wrong person<br />

3=occasionally used. right person<br />

4=used often, wrong person<br />

5=used often, right person<br />

27- -<br />

28<br />

16- 16-<br />

14<br />

I<br />

5 II<br />

25<br />

23<br />

2 3<br />

17<br />

l-n<br />

8<br />

5<br />

4<br />

PASSES<br />

AWARDS<br />

18<br />

11<br />

-TRAINING COURSE LETTER OF APPRECIATION PROMOTION


----------~-_-~._-.-- ..-- .- . ..-..._-.-__ _------_-<br />

-.. ..- ~_<br />

Table 1. Overall Platoon Means and Standard Deviations for<br />

Utilization of Incentives (N = 60 platoons)<br />

Incentive<br />

Public recognition for a job<br />

well done (1)<br />

Passes (2)<br />

Awards (3)<br />

Specialized training courses (4) 2.5<br />

Letters of appreciation or<br />

commendation (5)<br />

Mean ___ SD<br />

2.6 . 51<br />

2.8 . 54<br />

2.5 .48<br />

.52<br />

2.4 � 52<br />

Promotions (6) 2.6 .50<br />

Incentives - aggregated (7) 2.6 .42<br />

Table 2. Correlations Between Incentive Items and Organizational<br />

Variables and Performance Criteria<br />

Organizational Variables<br />

& Performance Criteria<br />

Learning Climate<br />

Incentives (see Table 1)<br />

t11 (21 (3) (41 (51 (61 (7)<br />

.60 .53 .56 .52 -38 .56 .74<br />

Job Satisfaction .61 .43 .62 .53 .49 .68 .71<br />

Platoon Pride . 48 .54 .62 .40 .44 .46 .67<br />

NTC/JRTC Expectations .61 .43 .46 -40 -54 .55 .61<br />

O/C Criterion Ratings . 19 .32 .45 -27 -18 .42 .39<br />

Company Commander Ratings -.05 .06 .29 -22 -.08 -.03 .09<br />

Platoon Self Ratings I 09 .23 .25 -13 .005 .23 .19<br />

Note: N = 60 platoons for correlations in first four rows; all<br />

correlations = pc.01. For O/C Ratings, N = 23 platoons: r values<br />

of .32 or higher = p


otation, and 6) general job motivation. These variables were<br />

selected because it was felt that incentive utilization would<br />

affect or be affected by these variables and that the latter<br />

should directly impact upon unit performance. In particular, it<br />

was anticipated that the use of incentives would relate to both<br />

event (NTC/JRTC) motivation and qeneral motivation.<br />

Table 2 presents platoon level correlations between the<br />

utilization of incentives (specifically and in the aggregate) and<br />

four key organizational variable scales. The reader will note<br />

that the use of incentives in the aggregate was more strongly<br />

correlated with the organizational variables than were the six<br />

specific incentives. Of the specific incentives, "Public<br />

recognitiontV, llAwardsl'., and '8Promotions'8 were the more strongly<br />

correlated. Table 2 also presents the platoon level correlations<br />

between the utilization of incentives and the three types of<br />

performance criteria (O/C ratings, commpany commander ratings and<br />

platoon ratings). While a few of the correlations reached<br />

statistical significance, the correlations are not that strong,<br />

particularly in comparison with those between NTC/JRTC motivation<br />

and unit performance or between platoon pride and performance<br />

(presented later). Thus, as suspected, the utilization of<br />

incentives seem not to be strongly associated with good unit<br />

performance but is strongly associated with other factors which<br />

more directly affect performance.<br />

Based on the pattern of highest inter-correlations and a<br />

little logic, the authors developed a tentative model describing<br />

how incentives might interact with other key organizational<br />

variables to impact upon platoon performance. The model, at this<br />

stage, must be considered only hypothetical: nevertheless, it<br />

provides a good starting point for subsequent inquiry. The model<br />

is protrayed in Figure 2.<br />

DISCUSSION<br />

While incentive utilization seems to play an important part<br />

in supporting variables directly impacting on unit performance,<br />

incentive utilization in the units examined was nonetheless low.<br />

This indicates both that leaders can more effectively use<br />

incentives and that, with more effective utilization, the<br />

numerical relationships found in this research might change.<br />

Since the aggregate use of incentives was more strongly,<br />

correlated with important organizational variables than the<br />

individual incentives, leaders may be able to shift from the use<br />

of constrained or slow-to-process incentives, or ones that take<br />

the soldier away from the unit (passes), to the use of incentives<br />

which are more efficient or effective (e.g., public recognition<br />

and awards). In interviews conducted at the post rotation, it<br />

was found that a major limitation on perceived incentive<br />

effectiveness was the length of time that occurred between the<br />

act or basis for the incentive and actual receipt of the<br />

incentive. Simply put, incentives should be used more and<br />

Processed more quickly. If this is done, unit performance should<br />

be Significantly enhanced.<br />

------a--...-~ .---. . ,


LE<br />

A<br />

D<br />

E<br />

0R-)<br />

S<br />

H<br />

I<br />

P<br />

I;<br />

L c<br />

E L<br />

A I<br />

RM<br />

NA<br />

’ T<br />

NE<br />

G<br />

4<br />

FIGURE 2. (TENTATIVE ) INCENTIVE UTILIZATION IMPACT MODEL+<br />

With Direct Inter-Scale Correlations<br />

VARIABLES b. c. d. e. 1. g. h .0/c h.co. h.plt<br />

a. Learning climate .79 .74 .81 .60 .60 .71 .52" .17 .30'<br />

b. Platoon pride .67 .82 ‘53 -62 .74 .57" .25 .26'<br />

c. Incentive utilization .71 .61 .55 .55 .39’ .09 ..l 9<br />

d. Job satisfaction .67 .74 .77 .65" .23 .22'<br />

e. eNxTpce?t!X%s .82 .55 -55" -.17 .08<br />

f. NTQJRTC .75 .65" .16 .07<br />

9.<br />

OWfl<br />

Job motivation -63" ,370. .31"<br />

Number of platoons 6 0 60 6 0 60 60 60 23 42 58<br />

‘rqx.05 *‘r=p


COHESION IN CONTEXT<br />

Guy L. Siebold<br />

U.S. Army Research Institute for the<br />

Behavioral and Social Sciences<br />

In the last few years, there has been a substantial amount<br />

of research on military unit cohesion. The research, by this<br />

author and others, has addressed some key questions: what is<br />

cohesion, how does it differ from similar constructs {e.g.,<br />

bonding and morale), how can it be measured, what impact does it<br />

have, and how does it change over time. However, left relatively<br />

unaddressed are the questions of how cohesion is associated with<br />

other major job related and organizational constructs and which<br />

of these constructs, relative,to each other, really makes a<br />

difference in organizational performance. The research presented<br />

in this paper was designed to start to answer these latter two,<br />

unaddressed questions. Specifically, the research examined the<br />

association between unit cohesion and unit performance directly<br />

and in the context of the platoon average degree of job<br />

satisfaction and platoon level of training proficiency.<br />

Method and Sample<br />

Data were collected by questionnaire from soldiers (squad<br />

members, squad leaders, platoon sergeants, and platoon leaders)<br />

within five light infantry battalions at three points in time.<br />

The first point in time (Base) was 4-G months before the<br />

battalion was scheduled to go through a training rotatior at<br />

either the U.S. Army National Training Center (NTC), Fort Irwin,<br />

CA or the U.S. Army Joint Readiness Training Center (JRTC), Fort<br />

Chaffee, AR. The second point in time (Pre-rotation) was 2-4<br />

weeks before the rotation: the third point (Post-rotationj -&as L-<br />

4 weeks after the training rotation. Questionnaires were<br />

administered by researchers from the U.S. Army Research<br />

Institute.<br />

Base and pre-rotation questionnaires were given typically to<br />

one company of. soldiers at a time in a classroom c.r dayroom<br />

setting and, being up to 160 items long, took the average soldier<br />

about 30 minutes to complete after instructions. Soldiers<br />

responded on a machine readable answer sheet. Post-rotation<br />

questionnaires were short (21 items plus some unit and position<br />

identification questions) and took soldiers less than ,lO minutes<br />

to complete; responses were made on the questionnaire itself.<br />

Post-rotation questionnaires were given at the st;?rt of<br />

.---.--- --.--. - -....... -_ - _...,<br />

The views expressed in this paper are those cf the :ii;thor<br />

and do not necessarily reflect the views of the c.C. *_^ >- v- 7 II . . 1.<br />

Research Institute or the Department of the Arm{.


interviews to groups of*soldiers in a company. All the platoon<br />

leaders in a company were one group; all the platoon serge,ants<br />

were a second group; two thirds of the squad leaders were a third<br />

group; and all squad members from one squad in the ccmpany formed<br />

a fourth group. Post-rotation questionnaires, along with the<br />

subsequent group interviews, were usually conducted in an office<br />

or dayroom setting.<br />

The base and pre-rotation questionnaires contained scales<br />

measuring cohesion and other job related and organizational<br />

constructs along with various demographic items. The postrotation<br />

questionnaires focused on soldier perceptions (self<br />

ratings) of mission performance during their recent rotation. In<br />

addition, for the two battalions which rotated through the JRTC,<br />

ratings on leader and platoon performance were provided just<br />

after the rotation by the observer/controllers who observed each<br />

platoon'during the rotation. In other words, the base and prerotation<br />

questionnaires contained the home station determinants<br />

(predictors) of performance: the post-questionnaires and ratings<br />

from the observer/controllers functioned as criterion measures of<br />

that performance. The total sample from the 5 light infantry<br />

battalions was 60 platoons: 45 line platoons, 5 scout platoons,<br />

5 mortar platoons, and 5 anti-tank platoons.<br />

Questionnaire items were structured to form scales measuring<br />

the constructs investigated. Scales addressed the following<br />

aspects of cohesion: squad member horizontal bonding, platoon<br />

leadership team (platoon leader, platoon sergeant, and squad<br />

leaders) horizontal bonding, vertical bonding between the squad<br />

members and the platoon leaders, platoon pride, and Army<br />

identification. Squad member horizontal bonding (SMHB) items<br />

measured whether squad members felt they cared about one another<br />

and worked together well as a team. Platoon leadership team<br />

horizontal bonding (LHB) items measured the extent to which the<br />

platoon leaders cared about one another and worked well together<br />

as a team. Vertical bonding (VB) items measured the extent to<br />

which subordinates felt their leaders were skilled and looked out<br />

for the needs of their subordinates. Platoon pride (PRIDE) items<br />

measured the extent to which members were proud of being in their<br />

platoon and played an important part in it. Army identification<br />

(AI) items measured the extent to which soldiers felt a part of<br />

the Army and that its successes were their successes. Soldiers<br />

responded to the cohesion questionnaire items using a five point,<br />

strongly agree to strongly disagree response scale.<br />

Scales also addressed constructs such as job motivation (job<br />

involvement), JRTC/NTC motivation, expectations of the value of<br />

JRTC/NTC training, job satisfaction, company learning climate,<br />

and level of task and mission training. As examples, the co-mpany<br />

learning climate items measured whether soldiers were given a lot<br />

of responsibility, got feedback on how they were doing, and were<br />

helped to learn from their mistakes; the job satisfaction items<br />

measured whether soldiers felt their work was interesting and<br />

useful. Most of the scales, or earlier versions of them, had


een used in prior research and thus had known or expected<br />

characteristics.<br />

For criterion measures of platoon performance, the ratings<br />

of three groups were used: observer/controllers (OCs) at the<br />

JRTC, company commanders (COs) rating their three platoons, and<br />

the platoon members. (PLT) themselves. The OC ratings were done<br />

at the JRTC after the rotation was completed: the CO and PLT<br />

ratings were made during post-rotation data collection. Each<br />

rater rated each platoon, about whose performance he was<br />

knowledgeable, on its performance during each mission conducted<br />

(e.g., movement to contact, deliberate attack, and defense). A<br />

rater's average rating of the platoon across observed missions<br />

became the criterion score. Raters used a 4 point scale:<br />

Trained, Needs a little training, Needs a lot of training,<br />

Untrained. PLT ratings were computed by averaging criterion<br />

scores across the four positions (squad member, squad leader,<br />

platoon sergeant, platoon leader), i.e., equally weighted by<br />

positon. Readers can contact the author for additional<br />

information on any of the scales. The predictor data used for<br />

the analyses in this paper are from squad member responses only.<br />

Leader perspectives will be addressed in future analyses.<br />

Results<br />

Scales. The questionnaire predictor scales used in this research<br />

typically had means of about 3.1 - 3.6 on the five point response<br />

scale, with standard deviations around 1.0 at the individual<br />

respondent level and around .5 as averaged at the platoon level.<br />

Scale reliability estimates (alpha values) were typically around<br />

the .8 level. The platoon performance criterion scales had the<br />

following characteristics: OC ratings - Mean = 2.1, SD = .41;<br />

CO ratings - Mean = 3.2, SD = .43; PLT ratings - Mean = 3.2, SD =<br />

. 32. Number of platoons rated were: OC - 23; CO = 42; PLT = 59.<br />

Direct impact of cohesion. As noted in Table l.a., all the<br />

aspects of cohesion correlated significantly with platoon<br />

performance as rated by the OCs at JRTC and as rated by the<br />

platoon members. The cohesion - performance relationship based<br />

on CO criteria was in the same direction but at a lower, nonsignificant<br />

level. Also, as noted in Table l-b., the different<br />

aspects of cohesion all correlated significantly with each other,<br />

although at a notably lower correlation coefficient level with<br />

the Army identification aspect. An initial factor analysis of<br />

squad member responses indicated that Army identification was a<br />

separate construct from the others, that squad member bonding was<br />

a separate construct, and that the other scales were linked to<br />

perceptions about the platoon leaders. Platoon pride loadings<br />

were split between the squad member and leader factors.<br />

Relation of cohesion to other constructs. As noted in Table<br />

l.c., other standard organizational constructs and level of<br />

training were related to the cohesion scales. In short, there<br />

-I -i ii


Table 1.a. Correlations Between cohesion and Platoon Average<br />

Mission Performance at JRTC or NTC by Rater of Performance<br />

Cohesion Scale<br />

SMHB<br />

Performance Raters<br />

oc co PLT<br />

. 52** .31* .30*<br />

LHB . 52** .15 .38**<br />

VB .47** .18 .31**<br />

PRIDE . 57** .25 .26*<br />

----._<br />

AI . 43** .20 .24* -<br />

Table 1.b. Intercorrelations Among Cohesion Scales<br />

SMHB<br />

LHB VB PRIDE AI<br />

. 74 .63 .a4 .54<br />

LHB . 77 .a1 .47<br />

VB .73 .45<br />

Table l.c. Correlations Between Cohesion and Standard<br />

Organizational Constructs<br />

Construct<br />

Job Motivation<br />

Cohesion Scale<br />

SMHB LHB VB PRIDE AI<br />

. 65 .66 .51-- .74 -71<br />

JRTC/NTC Motivation . 47 .49 -31. .62 .65<br />

Job Satisfaction . 67 -72 -59 .82 .69<br />

Learning Climate<br />

Task/Mission Training<br />

.67 .81 -75 -79 .64<br />

. 30 .31 -48 -31 .29*<br />

Note : * = p


-_._____ __ _.._^.._ _.. _.^ _--<br />

was a great deal of inter-dependence among the predictor<br />

constructs. This, of course, led to some analytic concerns, in<br />

particular about whether the cohesion construct correlations with<br />

the criteria were independent or due to the influence of some<br />

underlying factor or other construct. An examination of all the<br />

construct inter-correlations and exploratory factor analyses<br />

suggested that there might be a soldier general perception of job<br />

conditions accounting for the level of inter-correlation. To<br />

investigate this possibility, the correlations were re-computed<br />

controlling for the mean platoon job satisfaction. The results<br />

are shown in Tables 2.a. and 2.b.<br />

Table 2.a. Partial Correlations Between Cohesion and Platoon<br />

Average Mission Performance at JRTC or NTC by Rater of<br />

Performance, Controlling for Job Satisfaction<br />

Cohesion Scale<br />

_ Performance Raters<br />

oc co PLT<br />

SMHB<br />

--<br />

. 16 .21<br />

___--___-_--_-<br />

.21<br />

LHB . 10 -* 02 . 32**<br />

VB . 15 -05 .23*<br />

PRIDE . 11 .11 .13<br />

AI -.04 .06 .13<br />

* = p


with) strong platoon performance at the JRTC or NTC. Among all<br />

the predictor constructs, JRTC/NTC motivation and job motivation<br />

were the strongest correlates with the OC criterion ratings, .65<br />

and .63 respectively. Their correlations were also reduced when<br />

job satisfaction was controlled, to .34 and .27.<br />

Regardless of the large common variance among the predictor<br />

construct scales, a critical concern was whether the common<br />

variance among the predictor scales was due in part to the level<br />

of task and mission training. If this were the case, then the<br />

correlations with the criteria ratings could be simply an<br />

instance of high or low training at pre-rotation resulting in<br />

high or low performance at JRTC or NTC. To examine this, partial<br />

correlations were again computed controlling for the squad<br />

members' pre-rotation estimates of their platoon's level of task<br />

and mission training (measured using the same response scale as<br />

the criterion raters). The results are given in Table 3.<br />

Table 3. Partial Correlations Between Cohesion and Platoon<br />

Average Mission Performance at JRTC or NTC by Rater of<br />

Performance, Controlling for Pre-rotation Training Level<br />

Cohesion Scale<br />

SMHB<br />

Performance Raters<br />

oc co PLT<br />

. 49** .32* .22<br />

LHB .49* .16 .30*<br />

VB �<br />

42* .20 .lEj<br />

PRIDE . 55** .27* .I7<br />

AI . 39* .21 .16 --<br />

* = pc.05; ** = PC.01 N= 20 39 55<br />

As Table 3 shows, the partial correlations (with perceived<br />

training level controlled) are not much different from the direct<br />

correlations given in Table 1.a. Also, there was no or little<br />

change from the direct correlations for the other major predictor<br />

constructs, with training controlled. A fair interpretation of<br />

Table 3 would be that cohesion adds significantly to the mission<br />

performance of platoons at training centers such as the JRTC and<br />

the NTC beyond that portion of performance due to level of<br />

training. In other words, cohesion and other job related and<br />

organizational constructs provide a separate, important<br />

contribution to performance. Speculating from the data in this<br />

research, one can estimate that separate contribution to be in<br />

the range of 10 - 40% of the performance variance. Obviously,<br />

further research remains to be done in sorting out the nature and<br />

inter-relationships of the predictor constructs and in<br />

determining the constructs' relationship with performance across<br />

the range of construct values (e.g., low, medium, and high levels<br />

of the constructs).


EVALUATION OF THE ARMY’S FINANCE SUPPORT COhlMANlI<br />

ORGANIZATIONAL CONCEPT<br />

Raymond 0. Waldkoetter, William R. White, Sr.! tend<br />

Phillip L. Vandivier<br />

U.S. Army Soldier Support Center<br />

Fort Benjamin Harrison, IN 46216-5700<br />

A new-modular concept of organization was developed for the Finance Support<br />

Command (FSC) missions/functions to provide direct financial support to commanders.<br />

units, and activities on an area basis. An Army restructuring initiative resulted in the reorganization<br />

of the Finance Corps’ force structure with the planned FSC being a modular<br />

TOE, that is sized depending on the population supported with two to six assigned finance<br />

detachments. Before implementing the new organizational concept, a decision was later<br />

made at Headquarters, Department of the Army to conduct an evaluation to determine if<br />

the modular concept FSC would have the capability to perform the minimum essential<br />

wartime tasks. Those wartime tasks place the FSC as a focal point for providing commerci:tl<br />

vendor and contractual payments, various pay and disbursing services, and limited accounting<br />

on an area basis. Finance units must also be prepared to protect and defend themsrlvcs<br />

to continue sustainment of the force and maintain battle freedom for combat units to engage<br />

the enemy.<br />

A study team identified mission and functions to be performed by the FSCs in w:lrtime.<br />

The relationships of the FSC with organizations above, below, and parallel were<br />

outlined along with the interactions between these organizations. The study team estahlished<br />

criteria to be met with current doctrine and “principles of support” and “standards of<br />

service” as the foundation for battlefield finance support functions. A notional concept NX<br />

developed, staffed and evaluated to determine the preferred FSC unit force structure. With<br />

the assistance of subject-matter experts (SMEs) the capability of the preferred FSC design<br />

and related functions was analyzed, to address military finance support requirements for<br />

various theater scenarios, across the spectrum of contlict and in different geographical<br />

locations. Then, major Army commands (MACOMsj concurred with the recommeIld~ttior1<br />

to adopt the modular concept FSC organization with the proviso that the concept he dul)<br />

evaluated prior to implementing actions.<br />

The Soldier Support Center (SSC) hosted a MACOM level Finance Study Advi?;orF<br />

Group (SAG), 30 June - 1 July 1988, to further assess the proposed modular organizational<br />

design. The SAG recommended fielding the modular design and conducting an on-site fiel:i<br />

evaluation of the design prior to world-wide implementation. The Department of the Arill>..<br />

Deputy Chief of Staff for Operations (DCSOPs) concurred with bc.)th recommend~ltii-;ns.<br />

The views expressed in this paper are those of the authors and do not nzcessitrily reflect tilt<br />

view of the Soldier Support Center or the Department of the Army.


This field validation complied with mandatory guidance directing field validation for doctrinal,<br />

training, organizational, leadership, and materiel products before operational use<br />

(SSC, 1989). The field validation was to determine, then, if the modular organizational<br />

design was capable of supporting battlefield requirements (SSC, 1990a). Validation methodology<br />

was based on approved operational and training evaluation procedures and coordination<br />

of critical issues and criteria (TRADOC, 1987) with MACOMs and G3/J3 staffs, an<br />

integral part of the training, force structuring, and TOE approval process for finance units.<br />

METHOD<br />

The field validation was designed to be a “self-evaluation.” Evaluation materials<br />

were provided to the participating MACOMs who selected finance SMEs to observe the<br />

FSC unit structure, while the FSC was conducting an operational exercise or training and<br />

performing wartime tasks under simulated conditions (Thornton III & Cleveland, 199Oj.<br />

The SMEs were instructed to identify whether wartime missions/functions were in a category<br />

of “go”/“no go” or “unobserved,” according to the critical issues and related criteria<br />

they entered on the field validation data collections sheets. All “no go” situations were to bc<br />

explained as to which factor caused failure, such as doctrine, leadership, materiel, training,<br />

or organization. Required guidance and advisory assistance were furnished by the SSC<br />

throughout the evaluation.<br />

Major characteristics of the modular concept were to be operationally exercised<br />

during the field validation. It was to be determined that an acceptable level of wartime taskforcing<br />

and continuous operation is facilitated. Wartime and peacetime decentralized FSC<br />

detachment operations were to be effectively exercised with suitable support provided by<br />

the host unit. The designated issues and criteria were to be evaluated based on systematic<br />

SME observations during exercises or training. The SME evaluators were required to<br />

observe one FSC within their MACOM. The FSC and detachments were to be configured<br />

as described in the validation plan. It was requested every effort be made to control the<br />

wartime scenario so that realistic combat situations were experienced by the designated<br />

command and detachment personnel. The MACOM planning for a selected exercise/<br />

training sequence ensured that the SMEs were aware of the purpose of validation requirements<br />

and fully knowledgeable in finance wartime operations.<br />

The SMEs entered the FSC issues (three) and criteria (35,21, and 2, respectively,<br />

per issue) on the data collection forms and were instructed to keep the issues and criteria in<br />

numerical sequence, providing then 58 possible rating observations having “go”/“no go” or<br />

“unobserved” alternatives. Eleven SME evaluators collected validation data with five participating<br />

in Korea (26-28 Jun 89) and the other six at Fort Hood, TX (20-22 Sep 89). The<br />

five SMEs in Korea were from the 175th Theater Finance Command, 176th Finance Support<br />

Unit, and the six at Fort Hood, TX were composed of two evaluators from Forces<br />

Command (FORSCOM) Headquarters, Finance and Accounting Division (Fort McYherson,<br />

GA) and four from Fort Hood, 3rd Finance Group, 502d Finance Support Unit. Three<br />

SSC referee-observers participated with the provisional FSC units in Korea and at Fort<br />

Hood to furnish whatever expertise might seem useful without causing any disruptive re;lctions.<br />

4 5 :


The three critical issues were formulated to cover all major concerns regarding the<br />

expected operational capability of the FSC organization:<br />

tiorts?<br />

3. Carl the modular concept FSC orgunization transitiotl from peace to wartime operll-<br />

With the criteria requirements subsumed for each issue indicating responses of “go”/<br />

“no go” or “unobserved~” and space for comments from the SME evaluators, the collected<br />

responses were tested for significance using chi-squared. Although simple majority judgments<br />

are often employed to make decisions when deliberating on courses of action to be<br />

selected, it was decided that due to the operational consequences, the decision to adopt the<br />

FSC organization should be based on significant data comparisons to avoid any random or<br />

possibly biased observations.<br />

RESULTS AND DISCUSSION<br />

The SME evaluators responded to Issue 1 and its criteria with 150 (Goj, 12 (NO<br />

GO), and 124 (UNOBS) observations. Compared to the expected distribution of responses.<br />

it was found these responses were significantly different by chi-squared: X2 (2, N = 286) =<br />

112.81,~ c.001. While there is definitely a significant difference among the three categories<br />

of responses, only the difference between the “go” and “no go” would be significant. Even<br />

though the difference between the “no go” and “unobserved” would be significant, the<br />

meaning could not be clear since many comments related to the “unobserved” responses<br />

inferred that the techrzical wartime missions/functions were feasible (“go”). There was an<br />

overall impression that the technical wartime missions/functions can be performed. thc@~<br />

equipment and certain procedures may act as constraints. For Issue 1 there were more<br />

“unobserved” responses than for the other two issues. Some “no go” responses resulted<br />

from observations of deficient transportation assets and of lack of sufficient staffing. “Unobserved”<br />

responses were further attributed to evaluators judging some tasks were feasible.<br />

but resources were not available to operate during the training and field exercises. Technology<br />

and staffing shortages were repeatedly cited as cause for non-evaluation (omitted) and<br />

“unobserved” responses, with some missions/functions tending to become evaluated as “no<br />

go” without specific available equipment/materiel. Again, many “unobsen?ed” responses<br />

acknowledged the potential validity of the “go’s:<br />

Issue 2 and its criteria showed evaluator responses of 146 (GO.), 17 (NO GO). and 3~<br />

(UNOBS). Compared to the expected distribution of responses, it was found these responses<br />

were significantly different by chi-squared: X2 (2, N = 199) = 746.25, p C .(iOl.<br />

There is definitely a significant difference also among the three categories of rzsponser


here, and the other differences between the “go” and “no go” and “unobserved” are highly<br />

significant as well. There was a high degree of confidence that the lncticd wartime missions/<br />

functions can be performed as a result of the observed field validation training/exercises.<br />

The “go’s” showed confidence related to maintaining unit strength and adequate logistical<br />

and communication support. Responses of “no go’s” and “unobserved” pointed up training<br />

and equipment concerns as did some non-evaluated (omitted) tasks. Company battlefield<br />

tasks were considered feasible but some evaluators mistakenly omitted replies. Medical<br />

care in the field, NBC, and Security problems were anticipated with the level of transportation<br />

serving as a crucial balance between “go” or “no go” decisions.<br />

Issue 3 and its criteria showed evaluator responses of 5 (GO), 1 (NO GO), and 7<br />

(UNOBS). Compared to the expected distribution of responses, it was found these responses<br />

were significantly different by chi-squared: X2 (2, N = 13) = 4.31,~ ~45. There is<br />

a significant difference among the three categories of responses but not in favor of � ‘go’s.”<br />

However, there was sufficient reason to conclude the FSC organization will be able to<br />

transition frompettce to war-rime operations. Some dissonance existed concerning how to<br />

best prepare for the transition process, but “go” responses from the Korean evaluators in<br />

the “most like” wartime setting, indicated no difficulties in preparing for the transition. The<br />

only “no go” reply disagreed with status of the TOE unit garrison structure as a basis from<br />

which to initiate an effective transition process. Some omitted responses resulted from the<br />

units not being able to clarify the intent of this issue. Enough observations did result to help<br />

modify guidelines for transitioning.<br />

The three issues and related criteria when summed showed evaluator responses of<br />

301 (GO), 30 (NO GO), and 167 (UNOBS). With 140 SME responses omitted due to<br />

equipment availability, lack of criteria clarity, or redundancy of meaning, 78% (498) of the<br />

possible 638 responses were recorded for the field validation. Compared to the expected<br />

distribution of responses, it was found these responses were significantly different by chisquared:<br />

X* (2, N = 498) = 221.22,~ c.001. Comparisons showed the SME evaluator “go”<br />

responses were highly significant exceeding “no go” and “unobserved”, separately and<br />

combined. These results indicate that responses in favor of “go” judgments could hardly<br />

occur by chance, or by chance only once per thousand measures in similar data sets.<br />

CONCLUSIONS<br />

By aggregating the interrelated SME evaluator responses for the three issues and<br />

criteria, findings were derived describing useful observations to show the potential operational<br />

capability of the FSC modular organization.<br />

Comments from the 175th Theater Finance Command (Korea) forwarding their<br />

validation data supported the operational capability of the FSC Modular Concept. Suggcstions<br />

were given to improve operations by 11lanning to solve equipment. personnel, and<br />

transportation constraints.<br />

The 3rd Finance Group (Fort Hood, TX) evaluation comments indicated the FSC


.<br />

operational capability was validated to perform most of the essential techzicul and tacticnl<br />

battlefield functions. It was noted that some deficiencies in communication equipment,<br />

transportation, and staffing could limit the FSC in performing its battlefield mission as<br />

described in the Finance Operations manual (FM 14-7, 1989). Also further noted by the 3rd<br />

Finance Group were possible problems in the FSC transition from apencefime to wnrtime<br />

configuration, if it organizes and trains differently duringpeacetime than it is expected to<br />

operate in wartime.<br />

Comments submitted with the FORSCOM validation data elements pointed out<br />

“that the FSC is a sound structure to provide finance support to commanders, units, and<br />

soldiers.” It was noted, however, “the proposed TOE is not designed with sufficient assets<br />

(personnel, communication equipment, vehicles) to operate tactically in a dispersed mode.”<br />

From the visit by three SSC referee-observers in June 1989 to Korea, a trip report<br />

officially described from that early preview most of the results experienced by units in<br />

conducting later operational and training exercises to validate the FSC Modular Concept.<br />

Their findings generally anticipated from their critique and review of training and field<br />

exercises what other SME evaluators would experience. Based on the review and on-site<br />

Korea and Fort Hood visits, SSC observers agreed that the FSC can be expected to accomplish<br />

the minimum essential wartime tasks under the modular concept with minor modifications<br />

in staffing and equipment (SSC, 1990b). With suggested planning a smoother transition<br />

can be facilitated by the modular concept frompeac&ne to wartime operations.<br />

REFERENCES<br />

1. Finance Operations EM 14-7). (1989) Washington DC: Headquarters, Department of the Army.<br />

2. Thornton III, G. C., & Cleveland, J. N. (1990). Developing managerial talent through simulation.<br />

American Psvchologist, 45, 190-199.<br />

3. U.S. Army Soldier Support Center (SSC). (1990a). Field Validation of the Finance Support Command<br />

(FSC) Modular Concent. Unpublished manuscript, Directorate of Combat Developments, Fort Harrison. tN<br />

4. U.S. Army Soldier Support Center (SSC). (1990b). Finance Materiel Requirements Stub. Fort Harrison.<br />

IN: Directorate of Combat Developments.<br />

5. U.S. Army Soldier Support Center (SSC). (1989). Personnel Service Command EX3 and Finance<br />

Support Command (FSC) Field Validation Plan. Fort Harrison, IN: Directorate of Combat Dcvclopmcnts.<br />

6. U.S. Army Training and Doctrine Command (TRADOC). (1987). Handbook for Onerational Issues and<br />

Criteria. Fort Monroe, VA: Advanced Technology (Reston, VA).<br />

_,<br />

4 5 a


LEADER INITIATIVE: FROM DOCTRINE TO PRACTICE’<br />

Alma G. Steinberg and Julia A. Leaman<br />

U.S. Army Research Institute<br />

for the Behavioral and Social Sciences<br />

Introduction<br />

Initiative has been considered to be an important component of good leadership, especially military<br />

leadership (e.g., Headquarters Department of the Army, 1983; Rogers et al., 1982; Borman et al., 1987).<br />

However, there has been very little research on the actual practice of initiative by military leaders. This<br />

paper looks at leader initiative in Army combat units in terms of the relationship between leader initiative<br />

and unit performance, inhibitors of initiative, and approaches for developing leader initiative.<br />

Army doctrine is “what is written, approved by an appropriate authority and published concerning the<br />

conduct of military affairs” (Starry, 1984, p. 88). Two doctrinal publications define and describe leader<br />

initiative. One focuses on the Army’s doctrine for combat on the modern battlefield and is articulated in<br />

FM 100-5 (Headquarters Department of the Army, 1982). It reflects “the views of the major commands,<br />

selected Corps and Divisions and the German and Israeli Armies as well as TRADOC” (DePuy, 1984,<br />

~86). According to FM 100-5, initiative is something that large unit commanders must encourage in<br />

their subordinates. Initiative means to “act independently within the context of an overall plan,” “exploit<br />

successes boldly and take advantage of unforeseen opportunities,’ “deviate from the expected course of<br />

battle without hesitation when opportunities arise to expedite the overall mission of the higher force,” and<br />

“take risks” (p. 2-2).<br />

The second doctrinal publication addressing the importance of leader initiative focuses on militan,<br />

leadership doctrine (Headquarters Department of the Army, 1983). Here initiative is defined as “the<br />

ability to take actions that you believe will accomplish unit goals without waiting for orders or<br />

supervision. It includes boldness” (p. 123). Emphasis is placed on the importance of communicating<br />

values, goals, and accurate information about the enemy and other factors that affect the mission to<br />

subordinates so that the subordinates, in turn, can use initiative to accomplish the mission when they<br />

are out of contact with the leader or higher headquarters.<br />

The data reported in this paper are from Army combat units. They were collected as part of a larger<br />

project conducted in support of the Center for Army Leadership and the Combined Arms Training<br />

Activity; the project focuses on determinants of small unit performance. Thus far, data have been<br />

collected from five light infantry battalions that went through rotations at the Army’s Combat Training<br />

Centers (CTCs). The goals of this project are to identify leadership and other factors important to unit<br />

effectiveness and readiness, and to develop interventions for improving these factors.<br />

Method<br />

The data presented here come from several sources. They include data collected from units just<br />

prior to their participation in a CTC rotation, data collected from units just after their participation in a<br />

CTC rotation, ratings of observer-controllers (OCs) at a CTC, and written take-home packages that<br />

provide feedback on unit performance at a CTC, as follows:<br />

(a) Pre-CTC questionnaire responses by squad members, squad leaders, platoon sergeants, and<br />

platoon leaders in battalions shortly before their CTC rotations.<br />

(b) OC Ratings of CTC performance for two battalions,<br />

‘The views expressed in this paper are those of the authors and do not necessarily reflect the<br />

views of the U.S. Army Research Institute or the Department of the Army.<br />

455<br />

.


(c) Take-home package observations on CTC performance, by 00, for 12 CTC rotations which<br />

/ took place during 1988, 1989, and 1990.<br />

(d) Individual and small group interview responses of squad members, squad leaders, platoon<br />

sergeants, platoon leaders, company commanders, battalion executive officers, and battalion<br />

commanders in five battalions shortly after they completed their CTC rotation.<br />

(e) Post-CTC questionnaire responses by squad members, squad leaders, platoon sergeants,<br />

platoon leaders, and company commanders in five battalions shortly after they completed their<br />

CTC rotation.<br />

1. Soldier views of initiative.<br />

Results<br />

Pre-CTC Questionnaires. Squad members, squad leaders, platoon sergeants, and platoon leaders<br />

from two battalions (n=600) were asked, “When the leaders in your unit talk about initiative, what do they<br />

typically mean?” About 85% of the respondents to this open-ended question indicated that initiative was<br />

seen as involving the performance of routine or SOP behavior, without being told and/or without being<br />

supervised. It involved the accomplishment of their own job or the job of the leader in his absence. The<br />

remaining 15% of the respondents indicated that when leaders encourage them to use initiative, they use<br />

initiative to mean: do what we tell you to do, do objectionable tasks (e.g.. extra work, unpleasant tasks,<br />

low-level work), and/or make the leaders look good.<br />

Post-CTC Interviews. Most of the responses to the post-CTC interviews (from five battalions)<br />

indicated that the respondents felt that initiative involved the activity of carrying out their jobs or taking<br />

over for an absent leader. Initiative was seen as the initiation or continuation of behavior without being<br />

told or without the supervisor’s presence. Several of the examples of initiative that were given involved<br />

the recognition of a problem and the request to a higher level leader to be permitted to follow a different<br />

course of action (that was still within the scope of their job). In addition, there were a few incidents of<br />

initiative reported which involved the recognition that something beyond one’s own immediate<br />

responsibilities needed to be done, and personally performing the necessary tasks to get it done. When<br />

asked directly about the importance of encouraging subordinates to carry out the commander’s intent by<br />

exploiting successes boldly, taking advantage of unforeseen opportunities, deviating from the expected<br />

course of battle, and taking risks, respondents indicated that these were not really high priorities. From<br />

battalion commander on down, they said, in essence, the main thing is to have subordinates at each<br />

level who are well disciplined, technically and tactically competent, and are motivated to do their jobs<br />

well.<br />

2. The relationshia between leader initiative and unit combat performan-.<br />

Interview respondents indicated that they felt initiative (i.e., accomplishing the job without being told<br />

and/or without being supervised) is very important for success in combat. Pre-CTC questionnaires were<br />

examined to determine whether leader initiative was, in fact, a predictor of unit performance at a CTC.<br />

Table 1 shows that pre-CTC squad member ratings of the level of squad leader initiative are significantly<br />

related to OC ratings of platoon performance at a CTC. Similarly, pre-CTC squad member and squad<br />

leader ratings of platoon sergeant initiative are significantly related to OC ratings of platoon performance<br />

at CTC. However, pre-CTC ratings of platoon leader initiative by squad leaders, platoon sergeants, and<br />

company commanders are not significantly related to OC ratings of platoon performance at a CTC.<br />

Initiative, in the context of doing one’s job in the absence of being told or supervised, is not<br />

perceived as the same as motivation. For example, OC ratings of platoon motivation and platoon<br />

initiative at a CTC were not significantly correlated (r= .12). Neither were OC ratings of how hard ?he<br />

platoon worked and tried hard to do as good a job as possible significantly correlated with their ratings<br />

of platoon initiative (r=.19). Furthermore, OC ratings of platoon motivation and how hard the pla?oon<br />

worked were not significantfy correlated with their ratings of platoon performance, whereas OC ratings or<br />

platoon initiative were related to OC ratings of performance (r= .45, p < .05). Even more support of the<br />

relationship between initiative and performance comes from the significant correlation (r = .44. p 6. .35:


etween OC ratings of platoon initiative at a CTC and post-CTC ratings of platoon performance by<br />

company commanders.<br />

Table 1. Correlations Between Leader initiative Rated Pre-CTC and Platoon Performance at CTC<br />

Pre-CTC Initiative Ratings<br />

Of squad leader by squad members<br />

Of platoon sergeant by squad members<br />

Of platoon sergeant by squad leaders<br />

Of platoon leader by squad leaders<br />

Of platoon leader by platoon sergeant<br />

Of platoon leader by company commander<br />

* p c .05; n = 23 platoons<br />

3. Inhibitors of initiative.<br />

Correlations with<br />

Platoon Performance Rated by OCs<br />

r = .41*<br />

r = .44*<br />

r = .43*<br />

r = .31<br />

r = .25<br />

r = -.13<br />

In doctrine (Headquarters Department of the Army, 19&X3), identified inhibitors of initiative are: lack of<br />

understanding the mission, lack of accurate information, and lack of understanding the frame of<br />

reference (i.e., values, goals, and way of thinking) of the higher level leader and the subordinates. Figure<br />

1 provides a summary of the inhibitors of initiative mentioned in the CTC take-home packages, the post-<br />

CTC interviews, and the post-CTC questionnaires. As can be seen from Figure 1, the reported inhibitors<br />

of initiative cover a broad range of areas and provide additional inhibitors to those identified in doctrine.<br />

These include micromanagement, unit climate, concern about the reaction of others, fatigue, and lack of<br />

motivation.<br />

4. ADDroaches for deVelODinCl initiative in subordinates.<br />

In post-CTC interviews, leaders (squad leaders, platoon sergeants, platoon leaders, company<br />

commanders, battalion commanders) indicated that they do try to develop initiative in subordinates.<br />

They focus primarily at the squad member and squad leader levels and use the following approaches to<br />

develop initiative:<br />

(a) They develop the prerequisites. Leaders frequently mentioned three areas that they felt were<br />

prerequisites for showing good initiative: good discipline, proficiency in performing the job, and<br />

self-confidence. They try to develop the first two with training and the third, confidench, through<br />

physical training (PT).<br />

(b) They tell subordinates to show initiative.<br />

(c) They provide opportunities for subordinates to perform the role of their leader. Typically<br />

squad members are told to take over for their squad leaders, either as temporary fill-ins or for<br />

developmental purposes.<br />

(d) They reward initiative. Those showing exceptional initiative during training exercises are<br />

nominated for awards.<br />

457


Figure 1.<br />

.<br />

INHIBITORS OF INITIATIVE<br />

CTC Take-home Packaaes<br />

By Source of Information<br />

- lack of information<br />

- poor operations orders<br />

- micromanagement<br />

Post-CTC Interviews<br />

- micromanagement and lack of trust<br />

- lack of information<br />

- not permitted<br />

- climate (lack of support for initiative, don’t make mistakes)<br />

- missions involving the larger unit (e.g., Bn as opposed to Plt)<br />

. no opportunity (e.g., in dead tent, OC restrictions)<br />

- it’s safer for your career and shows more loyalty if you don’t<br />

let the higher leader know his ideas aren’t good<br />

Post-CTC Questionnaire (n = 322)<br />

34% Lack of relevant information<br />

31% Fatigue<br />

29% Concern about superior’s reaction<br />

24% Lack of understanding the mission<br />

20% Lack of a clear solution to the problem<br />

17% Lack of motivation<br />

16% Fear of making a mistake<br />

6% Desire to avoid being noticed<br />

5% Concern about subordinate’s reaction<br />

12% Other (e.g., too much changing of missions, inexperienced<br />

leaders, lack of time due to changes in plans or late<br />

operations orders, micromanagement)<br />

(NOTE: Percents do not add up to 100% because respondents were<br />

instructed to indicate all reasons that applied.)<br />

458


Conclusion<br />

This paper focused on leader initiative and looked at the relationship of leader initiative to unit<br />

performance. In this context, both doctrinal and field views of initiative were presented. Initiative, in the<br />

sense of doing one’s job without being told and/or being supervised, is seen as very important by<br />

soldiers and leaders. It was shown that leader initiative is significantly correlated with unit performance,<br />

and yet is clearly distinguishable from motivation. The fact that field views of the inhibitors of initiative<br />

are broader than those presented in doctrinal sources suggests that the information gained in this<br />

research might be of benefit to doctrinal proponents as well as those developing leader training courses<br />

or conducting field training.<br />

References<br />

Borman, W. C., Motowidlo, S. J., Rose, S. R., Hanser, L. M. (1987). Development of a model of soldier<br />

gffectiveness, (AR1 Technical Report 741). Alexandria, VA: U.S. Army Research.<br />

Institute.<br />

DePuy, W. E. (1984). Letter, General W. E. DePuy to General Fred C. Weyand, Chief of Staff, Army, 18<br />

February 1976. In John L. Romjue, From active defense to airland battle: The develoDment<br />

gf Armv Doctrine 1973-1982. Fort Monroe, VA: U.S. Army Training and Doctrine Command.<br />

Headquarters Department of the Army, (1982). Ooerations, (Field Manual 100-5). Washington, DC:<br />

Department of the Army.<br />

Headquarters Department of the Army, (1983). Militarv Leadership, (Field Manual 22-100).<br />

Rogers, R. W., Lllley, L. W., Wellins, R. S., Fischl, M. A., & Burke, W. P. (1982). Development of the<br />

P), (AR1 Technical Report 560).<br />

dexandria, VA: U.S. Army Research Institute.<br />

Starry, D. A. (1984). Commanders Notes No 3, Operational concepts and doctrine, 20 February 1979. In<br />

John L. Romjue, From active defense to airland battle: The develoDment of Army Doctrine 1973-1982.<br />

Fort Monroe, VA: U.S. Army Training and Doctrine Command.<br />

459<br />

.<br />

.


STARTING A TQM PROGRAM IN AN R&D ORGANIZATION<br />

Herbert J. Clark<br />

Brooks Air Force Base, Texas<br />

This paper reports the results of implementing a Total Quality<br />

Management (TQM1 Program in an Air Force research and development<br />

laboratory. It outlines how the Methodology for Generating<br />

Efficiency and Effectiveness Measures (MGEEM) was used to<br />

implement TQM, and describes the lessons learned in the process.<br />

The paper also gives guidelines for starting a TQM program and<br />

recommends using Organizational Development (OD) intervention<br />

techniques to gain acceptance of the program. Lessons learned<br />

stress the importance of choosing a skilled TQM facilitator,<br />

adequately training process action teams, and fostering open<br />

communications and teamwork to reduce resistance to change.<br />

GETTING STARTED<br />

People report they read the popular literature or hear a TQM<br />

briefing and come away with a general understanding of TQM<br />

philosophy, but no specific directions on how to get started.<br />

This condition is so common that, according to Kanji (19901, it<br />

even has a name: ‘Total Quality Paralysis!'<br />

Kanji's solution for overcoming this problem is to follow a fourstage<br />

TQM implementation procedure. It consists of collecting<br />

organizational information, getting top management support for<br />

TQM, developing an improvement plan, and starting new initiatives.<br />

Following these four steps, leads to commitment from the top, a<br />

united and coordinated middle management, and the data to make<br />

informed decisions -- essential conditions for TQM success.<br />

Behavioral scientists writing in the OD 1 i terature recommend<br />

similar procedures and have developed ------------ intervention ------ techniques for<br />

gaining management support for new initiatives. They consist of<br />

educational activities, questionnaires, team building exercises.<br />

and prescriptions of 'things to do' and 'things not to do.'<br />

French and Bell (19841 describe five types of interventions which<br />

range from working with whole organizations to working with teams<br />

and individuals. These interventions can be used in TQM programs<br />

to -increase participative management and intergroup cooperation.<br />

Coupled with TQM tools such as statistical process control, they<br />

can lead to increased productivity, better product quality, and<br />

enhanced customer satisfaction. Trying to introduce TQM without<br />

considering the behavioral dynamics of the organization<br />

Significantly reduces the chances for success, as illustrated<br />

below.<br />

460


AN ILLUSTRATION<br />

In 1988, the Air Force Human Resources Laboratory (AFHRL) started<br />

a Total Quality Management (TOM) program in the laboratory. The<br />

technique used to implement T4M was the Methodology for Generating<br />

Efficiency and Effectiveness Measure6 (MGEEM) described by Tuttle<br />

and Weaver (1986). MGEEM uses a group decision making technique<br />

to clarify an organization's mission, identify its customers,<br />

specify Key Result Areas (KRAs), and measure progress in the KRAs<br />

using mission effectiveness indicators. Air Force Regulation 25-5<br />

recommends using MGEEM to do TQM.<br />

Despite top management support, the reaction to starting MGEEM at<br />

AFHRL was negative. Had there been a vote at the TQM start-up<br />

meeting, it is unlikely that a majority of the laboratory staff<br />

would have endorsed implementing TQM or MGEEM. The commander saw<br />

no reasonable alternative to MGEEM, however, so he directed its<br />

implementation.<br />

Twenty months after the program began, support for MGEEM was still<br />

weak. Of the 94 (out of 380) people answering a laboratory TOM<br />

newsletter, 80% said TQM/MGEEM was of 'No Value' or 'Some Value.'<br />

Only 20% said it was of 'Moderate Value' or 'Significant Value.'<br />

Several written replies said to stop MaEEM. The attitude toward<br />

the TQM philosophy was more positive.<br />

MGEEM was rejected because people in the laboratory did not have a<br />

sense of ownership in the program. Although division-level<br />

management participated in selecting the MGEEM KRAs and<br />

indicators, they did not support using MGEEM in an R&D laboratory.<br />

Thie attitude was passed on to lower levels of management, so few<br />

people supported the program. This attitude prevailed, even though<br />

several of the scientists in the laboratory helped develop MGEEM.<br />

The finding that MGEEM was not widely accepted at AFHRL does not<br />

mean that it is an ineffective technique for implementing TOM.<br />

Some observers felt that MGEEM was rejected prematurely and did<br />

not receive a fair test. Others felt that its rejection may have<br />

been more a consequence of how management introduced MGEEM than<br />

its methodology.<br />

Had AFHRL used OD intervention techniques while implementing 'NM,<br />

it is possible that they would have chosen a more acceptable TQM<br />

approach. Three OD techniques which could have been applied are<br />

survey feedback, the confrontation meeting (Beckhard, 19671, and<br />

work teams. Advantages to this approach are that problem<br />

identification is based on survey data; top management and work<br />

teams define the problems and propose solutions: middle<br />

management and workers develop the specific TQM procedures; and<br />

the survey data provide a reference point for surveys administered<br />

after changes have been made.<br />

When employed by a skilled facilitator, these techniques increase<br />

the chance of everyone developing a sense of ownership in the<br />

procedures adopted. TQM tools, such as cause and effect diagrams,<br />

461


are used to examine the processes associated with product quality<br />

after the group has accepted the need for change.<br />

LESSONS LEARNED<br />

In December 1989, Clark (1989) reported several lessons learned<br />

during the TQM program at AFHRL. The following is a summary of<br />

additional lessons learned.<br />

Facilitators. --_-----e--- Facilitators must be familiar with TQM quality<br />

improvement procedures and with OD techniques for gaining program<br />

acceptance. Facilitators should also be able to train people in<br />

TQM and OD. It is best to use facilitators who are not a part of<br />

the management group that is initiating TQM. Facilitators need<br />

the independence and authority to run the program as approved.<br />

Organizations sometimes appoint their own facilitator and conduct<br />

a do-it-yourself TQM program. An alternative is to hire a fulltime,<br />

thoroughly trained facilitator from outside the<br />

organization who can offer TQM alternatives. Facilitators should<br />

not impose their own philosophy on an organization or direct a<br />

specific TQM approach. The organization should develop its own TOM<br />

approach based on its unique requirements.<br />

Process ------- Action ------ ----- Teams ----_-* (PATS) The role of PATS in a TQM program is<br />

to examine manufacturing and administrative processes and Improve<br />

the quality of service to the customer. Twenty months into the<br />

TQM program at AFHRL, 380 people were asked: How<br />

valuable are the process action teams at AFHRL? Twenty-one<br />

percent of the 94 people answering said, 'No Value.' Thirty -<br />

five percent said, 'Some Value'; 31% said, 'Moderate Value'; and<br />

13% said, 'Significant Value.' These results were surprising<br />

because, throughout the TQM program, people said the PATS were the<br />

most effective and worthwhile part of the program. We expected<br />

more people to say the PAT's were of significant value.<br />

Written comments from the survey showed that peopie who said<br />

PATS were of no value were either not aware of what the PATS were<br />

doing, felt the PATS created too much bureaucratic busy work, or<br />

thought the PATS were not addressing the right problems.<br />

People who rated them highly said the PATS increased<br />

communications, involved people from lower levels, and proposed<br />

effective solutions to problems.<br />

Most PATS at AFHRL worked on improving administrative procedures.<br />

There was less progress in improving the quality of the laboratory<br />

R&D product and customer satisfaction. PATS should spend a majcr<br />

portion of their time working on product improvement and customer<br />

Satisfaction. Excessive attention to administrative procedures<br />

can be a symptom of undue concern about management and too 11++"tile<br />

concern about customer satisfaction and product quality.<br />

PATS are not the solution to all problems. It is easy t0 defer<br />

decisions to a committee without exercising leadership. L??me


problems sent to/~+ATs could be easily solved by management in half<br />

the time.<br />

Training. Ty>ical TQM training programs consist of lectures on<br />

-m----w<br />

the philosop;lies of Deming, Juran, Crosby, and other well-known<br />

quality advocate 8 . There should be additional training on such<br />

Subject8 aq participative management, customer interface, process<br />

control application8 to non-manufacturing activities, and<br />

statistical analysis.<br />

Process Action Team8 need training on group participation skills,<br />

brainstorming, CaU8e and effect diagrams, and other TQM tools.<br />

Most expert8 orient this training toward8 the task at hand, Tather .,<br />

than toward8 people'8 feeling8 and personalities. Training in OD<br />

intervention techniques comes after management ha8 decided which<br />

OD techniques to apply. Unless people receive specialized<br />

training in OD and TQM, they do not know how to get underway.<br />

Communications. Good communication is fundamental to the 8UCCe88<br />

---------T----<br />

O! any TQM program. Yet, many organization8 have poor<br />

communications. Upward communication is poor because manager8<br />

fail to listen. Downward communication is poor because managers<br />

want to protect their worker8 from what they con8ider to be<br />

irreievant information. The result is a communication gap between<br />

manager8 and workers. Change8 to this pattern can come about by<br />

recognizing the problem and training new behavior8 through<br />

Cla88rOOm di8CU88iOn and leadership example.<br />

Some organization8 increase communications, openne88, and teamwork<br />

by uaing newsletters. AFHRL started a newsletter halfway through<br />

it8 TQM program. The newsletter was distributed each month and<br />

invited everyone'8 participation. Informal conversations<br />

indicated there may have been more TQM di8CU8SiOn8 in the<br />

laboratory because of the newsletter. A newsletter keep8 the<br />

importance of quality and productivity gain8 visible to management<br />

and employees.<br />

Measuring ---a---- Quality ----s e-w and --------M-e Productivity. One way to measure quality and<br />

productivity in an R&D organization is to establish customer<br />

requirements, set goals, and measure progress toward8 reaching<br />

those goal8 in cooperation with the customer. Although this method<br />

i8' more appropriate for applied R&D project8 than for basic<br />

research, it can be used for both. In an R&D organization, there<br />

is usually le8S emphasis on measuring scientific progress through<br />

use of the traditional TQM statistical process control techniques.<br />

Surveys can be used to measure customer satisfaction.<br />

Resistance ------e--m to -- TQM. --- Some people resist any type of organizational<br />

change. They do not want to start a TQM program or any other<br />

program. They jU8t want to be left alone to do their work.<br />

Others fear a 1088 of responsibility while still other8 fear they<br />

may get some. Reactions range from outright argument against TQM<br />

to stonewalling and simply waiting out current management.<br />

463


Management must listen, but also lead. If data show that<br />

organizational problems exist, open discussions should take place:<br />

but it is up to top management to lead the organization. This<br />

does not prevent the use of OD techniques. In fact, the greater<br />

the problem and resistance, the greater is the need for OD. People<br />

become be1 fevers based on the enthusiasm, examples, ideas, and<br />

data presented by management.<br />

Whatever TOM strategy and tactics are adopted, they must be<br />

reviewed and updated at least once each year. This action<br />

accommodates criticism and conveys a sense of continually striving<br />

for improvement and acceptance of the procedures adopted.<br />

Labeis --_--- * Labels, such as MGEEM, TQM, MBO, and Zero Defects, can<br />

easily become scapegoats for people dissatisfied with a new<br />

management initiative. One way around this is to avoid using<br />

labels. The NASA Lewis Research Center, for example, calls its<br />

quality improvement program just that, a quality improvement<br />

program (Office of Management and Budget, 1990). Although NASA<br />

uses Deming principles and the ideas of other TQM experts, they<br />

intentionally avoid referring to their program as a Deming program<br />

or a TQM program. Their program is a combination of quality<br />

initiative3 uniquely patterned for their organization. This may<br />

be a good policy to adopt, since it can be more difficult to argue<br />

against a quality improvement program than a specific TQM program<br />

with a label.<br />

FADS<br />

Many who read this paper will be familiar with the long list of<br />

publications which tell how to improve organizational<br />

productivity, quality, and morale. A particularly good summary of<br />

fads has been published by John Byrne (1986). He tells rn a very<br />

entertaining way how fads come and go, and what are the latest<br />

fads. He says that too many modern managers are like compulsive<br />

dieters: trying the latest craze for a few days, then moving 0 R<br />

(P. 58).<br />

The theme of this paper is that things do not have to be that<br />

way. An initiative to increase productivity and quality can<br />

succeed and endure if people in the organization buy .into it.<br />

First, they have to believe they need a change; then they have %o<br />

agree to participate in the program. Because people are different<br />

and organizations are different, the approach must be tailored tc;<br />

the organization.<br />

Success requires a qualified facilitator or change agent who can<br />

teach people how to work as teams. Additionally, all levels of<br />

management must endorse and actively sponsor the management<br />

change. Workers must have goals which are consistent with t, h e<br />

overall goals of management. OD techniques can help gain 7, h t?<br />

required trust and cooperation needed to sustain a TQM program.<br />

All this takes time, patience, and considerable skif?.<br />

464


If TQM does not work as promised. we may have to admit that<br />

programs which rely on people's good will just won't work. As Ring<br />

Lardner, Jr. (1990) said about Communism in Eastern Europe:<br />

Communism like Christianity is good in theory, but given human<br />

nature, hard to put into practice. Perhaps the same can be said<br />

about TQM.<br />

REFERENCES<br />

Beckhard, R. (1967, March-April). The confrontation meeting.<br />

Harvard Business Review 45<br />

--e-e-- -----w-- --,,-,a --' 149-155.<br />

Byrne, J.A. (1986. January). Business Fads: What's in -and-out? .<br />

Business Week<br />

------m- ----AL' pp.52-58.<br />

Clark, H.J. (1989, December). Total --m-e quality --w-w -e-w management: --a--- an<br />

$Bpllcation in a research and ___--- development ---- -s------- laboratory (AFHRL-TP-85=<br />

58,--AD-Azi5-808>T--BFooks-AFB, TX: Special Projects Office, Air<br />

Force Human Resources Laboratory.<br />

French, W., & Bell, C.H., Jr. (19841. Organization -- ---e---e- development: ------ -----<br />

Behavioral ---------- science --a---- _____-_------ interventions --- for organization ------v-- -- imBrovement ----,-,,A<br />

Englewood Cliffs, NJ Prentice-Hall.<br />

Kanji, GI.K. (1990). Total quality management: the second<br />

industrial revolution. Total quality &n~gement l(l),pp. 3-12.<br />

----- ---a- -,--,1-<br />

Lardner, R., Jr. (1990, March 191. --,: U S News ____ & World _ __ Report ,,-1<br />

Washington, DC. p. 27.<br />

Office Of Management and Budget. (19891. Quality ------ improvement e-w---prototype.<br />

----- Unpublished document available through the GASA Lewis<br />

Research-Center, Cleveland, OH.<br />

Tuttle, T.C., & Weaver, C.N. (1986, November). Methodology _ ____ for<br />

generating --v-e--- efficie_ncy --- and ------------- effectiveness measures -w-w---- (ggE_E@)i.a guide ----<br />

--- for -a- Air ----- Force ----------- measurement -------we--- facilitators (AFHRL TP-86-36, AD-Al74<br />

574). Brooks AFB, TX: Manpower and Personnel Division, Air Force<br />

Human Resources Laboratory.<br />

465


32nd Conference of the Miiitarv TestinP <strong>Association</strong> (MTA:<br />

An officer, a social scientist, (and possibly A gentlematl) in the Royal<br />

Netherlands Army (R!iLA).<br />

Presentation by Co1 Dr. G.J.C. Roozendaal<br />

Head, Behavioural Science Division<br />

Directorate of Personnel RNLA<br />

The Royal Netherlands Army<br />

In peacetime the Royal Netherlands Army has 78,000 employees, consisting of<br />

23,000 regular servicemen, 43,000 conscript personnel and aimost 12,000<br />

civilian employees. The RNLA can rapidly reach its wartime strength of<br />

200,000 men and women by calling up reserve personnel.<br />

The Royal Netherlands Army is a volunteer-conscript army, with national<br />

service (for men only) lasting 14 to 16 months. [!I<br />

Women have the right to serve, but only as volunteers. In principle all<br />

posts are open to women.<br />

The Royal Netherlands Army has its own military psycholcgical and social<br />

service, comprising around 20 regular officers in the ranks from major up to<br />

and including brigadier-general.<br />

All these officers have graduated from a Dutch University in either<br />

psychology or sociology.<br />

Virtually all of these officers were given their basic training at the Royal<br />

<strong>Military</strong> Academy, after which they spent several years in active service as<br />

a platoon commander, company commander and/or a staff ofticer with an ac,tive<br />

unit.<br />

Only then have most officers completed their training as social scientists.<br />

<strong>Military</strong> behavioural scientists occupy various posts in different fieids of<br />

work.<br />

Allow me to give you some examples:<br />

1. The personnel manager of the Directorate of Personnel RNLA is a<br />

brigadier-general psychologist.<br />

2. There are four colonels who act as, amongst others:<br />

- Head of the Behavioural Sciences Division;<br />

- Commander of the Didactics and <strong>Military</strong> Leadership Training Centrt;:<br />

Instructor at the Royal <strong>Military</strong> Academy:<br />

- Head of the Individual Assistance Section.<br />

3. One iieutenant-colonel is Commander of the Seiyc:t.:luz Centre of the<br />

Royal Netherlands Army.<br />

In addition there are another fourteen officers in the ranks of ma:'-jr<br />

and lieutenant-colonel who occupy a wide range o!: !r.+se;irch, policy 22.:<br />

assistance posts.<br />

I shall now endeavour to use some examples to make .ir. :.Lt?ar tc: yell what<br />

exactly these officers do.


Research:<br />

In cooperation with a number of civil institutes, research is conducted<br />

into:<br />

a. Job satisfaction<br />

Every year a sample survey is carried out amongst. 5X of regular<br />

personnel with regard to their well-being,<br />

motivation, their opinion on personnel policy and current matters.<br />

Finally, they are also asked if they contemplate leaving the service.<br />

This year the questions concerned the military personnel’s opinion on<br />

the change in East-West relations, and on the planned reductions in .<br />

personnel.<br />

I am unfamiliar with your research experiences, but we at the RNLA have<br />

noticed that personnel have not lost their motivation for their military<br />

task, although they are uncertain as to whether their jobs will continue<br />

to exist.. However, they are mainly of the opinion that the reductions<br />

will not affect them personally, but rather a colleague elsewhere.<br />

This affects the advisory policy to be pursued with regard to reductions<br />

in personnel.<br />

b. Exit interviews<br />

Exit interviews are held with all personnel leaving the Royal Netherlands<br />

Army prematurely. This yields information for the organisation as<br />

to how it is valued as an employer and how personnel policy can best be<br />

altered.<br />

In this context, extra measures have been taken in order to increase<br />

the ability to retain technical personnel, doctors and other highlytrained<br />

personnel.<br />

The exit interviews also provide information enabling the policy to<br />

integrate women and ethnic groups to be adapted accordingly.<br />

Remarkably enough, the exit interview is now needed in order to<br />

determine which measures can be taken to achieve increased voluntary<br />

outflow in the light of the reductions in personnel - an interesting<br />

change in the use of exit interviews.<br />

C. Violence in the armed forces<br />

Research was conducted recently into the occurence of violent incidents<br />

within the armed forces, covering all types of physical and/or mental<br />

violence ranging from coarse language, swearing, harassment and physical<br />

violence, right up to forms of sexual violence.<br />

The research showed that, fortunately, forms of serious sexual abuse<br />

are fairly rare.<br />

However, the research did lead to a series of recommendations<br />

as to how leadership qualities can be improved.<br />

These recommendations are now being implemented.<br />

d. Homosexuality research<br />

Following on from the above, research is currently being conduc:t.ed into<br />

the extent to which homosexual soldiers experience forms of discrimination.<br />

A sample survey is also being conducted amongst all soldiers with<br />

regard to their attitude towards homosexual colleagues. This survey is<br />

still in its initial stages, but it will certainly enjoy particularly<br />

strong political interest.<br />

467


e. Ceployability research<br />

On several occasions my division has conducted research into the effect<br />

of lengthy exercises on the deployability of regular and conscript<br />

military personnel.<br />

This research has resulted in the adjustment af the operational plans<br />

formulated by the Netherlands Army Staff.<br />

f. Miscellaneous<br />

Without entering into any detail, I shall just mention research which<br />

my division has conducted regarding: the use of alcohol, drugs and<br />

gambling addiction, the integration of women in the RNLA, unit consultations,<br />

conscript NCCs, and so on.<br />

Selection of personnel<br />

a. My division recently developed and implemented a personality test with<br />

a view to selecting prospective conscript personnel on their suitability<br />

for compulsory national service. This can limit the number of<br />

conscripts who dysfunction for psychological reasons.<br />

b. A procedure has been developed for prospective regular officers and<br />

NCOs to determine more accurately their suitability for the Royal<br />

Netherlands Army. This procedure uses personality studies and biographical<br />

data, compiled through standard interviews.<br />

All interviewers have been trained at length (apprcximately 2 year-s).<br />

Validation studies have shown that a well-trained<br />

interviewer is far more capable of selecting the right man or woman for<br />

the army than is a study of hisjher abilities.<br />

C. A computerised lo-task test is currently being developed in cooperatior.<br />

with the Institute for Perception (TNO).<br />

The test is designed to gauge both stress tolerance and capacity for<br />

multiple information processing. For validation purposes, these computer<br />

tests are now included in the existing test procedures.<br />

d. In a while I shall discuss the research we have conducted in order tro(:ey


. Our research into battlefield conduct has led to techniques now being<br />

introduced, which can reduce the effects of battle stress.<br />

The techniques are applied at individual level in activities which are<br />

by their very nature stressful, such as parachuting, rock climbing or<br />

diving.<br />

Furthermore, commanders are thoroughly prepared for the effects of<br />

stress and battle stress, and they are taught how to recognise stress<br />

symptoms and how to act when faced with them.<br />

C. We have conducted intensive research into the effects of lack of sleep<br />

over a long period.<br />

Just one of the things this has revealed is how lack of sleep influen-<br />

ces leadership.<br />

After 48 hours' sleep deprivation the effectiveness of decisions taken<br />

declines dramatically. The factor causing the most concern is that<br />

commanders often do not realise, or realise only vaguely, that they are<br />

no longer capable of making responsible decisions.<br />

Symptoms of this kind have given rise to a great deal of attention<br />

being paid to the aspect of sleep management. A remarkable fact is that<br />

many commanders reject the<br />

implementation of sleep management, as they deem it<br />

un-military.<br />

However, we shall persevere.<br />

d. The RNLA has paid too little attention to Psychological Operations<br />

(Psycops) for too long. In fact, until recently the subject was not<br />

open to discussion on a political level, even in today’s free society.<br />

Psychological defence (preparing oneself for the adversary’s psychological<br />

operations) was all that was politically acceptable at that time.<br />

Recently however, the subject of Psycops has been attracting more<br />

attention, something which has been partly influenced by the attention<br />

we have paid to the effects of battle stress.<br />

For us this constitutes an interesting topic for research; one about<br />

which we think we can learn a lot from others.<br />

e. The attention given to battlefield behaviour, which I have already<br />

mentioned, has led to an entirely different structure of the treatment<br />

of combat stress victims in actual wartime conditions. Based on the<br />

well-known principles for treating combat stress victims,<br />

proximity (treatment at the front)<br />

- immediacy (treatment as soon as the symptoms occur and as quickly as<br />

possible)<br />

- expectancy (treatment to return the victims to active service). we<br />

formed a system of combat stress recovery units for our 1 (NL) Army<br />

Corps.<br />

The combat stress recovery units are located in the rear areas of the<br />

brigades and must be able to operate as mobile units.<br />

The battalion aid post could serve as a collection point for battle<br />

stress victims; this is where triage takes place.<br />

After triage the battle stress victims are treated in the battle stress<br />

recovery unit. The head of the battle stress recovery unit will be an<br />

officer from our psychological and social service, one of our trained<br />

psychotherapists in fact.<br />

469<br />

.


--- ---<br />

We estimate that 25% of all victims will be battle stres,s victims.'Our /'.<br />

objective is to return 50% of them to active duty withintwo days, and<br />

ultimately 80% within seven days.<br />

The heads of the battle stress recovery units will also,,be able to act<br />

as staff officers to the brigade commander.<br />

Their responsibilities will include the prevention ::f stress-related<br />

problems.<br />

Our battle stress recovery organisation is not yet ready, but we aim :o<br />

have it operational by 1991.<br />

In order to prevent symptoms of FTSD (post-traumatic stress disorder),<br />

analysis is now taking place in order to determine whether, in the<br />

event of the RNLA being deployed for peace-keeping operations, the<br />

assignment of military psychologists at a lower level would be worthwhile.<br />

Assessments<br />

f. Earlier t'nis year we introduced a new procedure for assessing regular<br />

personnel. In this system, more emphasis is laid on the influence of<br />

assessments on the management development system. In order to exclude<br />

undesired effects such as the unequal distribution of power, stereotyping<br />

and so on, every battalion now has a so-called assessment advisor.<br />

We have thoroughly trained these advisors, who are intended to support<br />

both the commanders and the individual soldiers to be assessed.<br />

This system was implemented at the beginning of this year.<br />

Evaluations to see whether the c1)jec:tive has been met will take plac:e<br />

in 1991.<br />

Education and training<br />

a. In 1985 the Didactics and <strong>Military</strong> Leadership Training Centre was<br />

established for the RNLA. One of my colleagues, a colonel.and a<br />

sociologist, is the commander of this centre.<br />

All future military instructors are trained at the centre, as are the<br />

so-called didactics specialists who are appointed to ever'y training<br />

centre.<br />

b. The centre also offers possibilities for leadership training to military<br />

commanders of all levels.<br />

Team-building procedures are developed and distributed from this<br />

training centre.<br />

c . Finally, the training csntre plays a leading part in the use of new<br />

teaching methods such as computer-based instruction, simulators, wargames,<br />

the training of social skills, and so forth.<br />

Care of dysfunctioninK oersonnel<br />

a. One of my colleagues, a colonel and a clinical psychologist/<br />

psychotherapist, is head of our Psychological Support Section.<br />

His section comprises three offices; the head of eac:r? of these offices<br />

is a military psychologist.<br />

470


. Every year these offices give psychological (psychotherapeutic)<br />

aid to some 5,000 soldiers. Sometimes the problems are<br />

simple, and can be solved simply by advising a transfer. Sometimes,<br />

however, the problems are related to psychologically more complex<br />

matters involving alcohol, drug and gambling addiction, family problems<br />

and individual dysfunctioning. A number of these problems stems from<br />

traumatic service experiences, accidents, shooting incidents and the<br />

delayed effects of PTSD or battle stress.<br />

C. By virtue of this very experience, the psychologists from these offices<br />

are exceptionally well deployable in the battle stress recovery units I<br />

described earlier.<br />

The integration of women in the RNLA<br />

a. As most of you are probably aware, all posts in the Netherlands armed<br />

forces have in principle been accessible to women since 1978 (including<br />

combat duties).<br />

I need hardly remind you that this does not mean there are no problems<br />

involved with the integration - on the contrary, there are.<br />

b. Some of these problems are obviously caused by the<br />

differences in physical strength and [powers of?] endurance between men<br />

and women.<br />

These facts are generally accepted, and are therefore open to discussion.<br />

Moreover, as is the case in the Royal Netherlands Army, effective<br />

policy measures can be<br />

implemented to cope with these differences.<br />

Allow me to give you an illustration. All military posts are classified<br />

according to physical demand.<br />

A method has been developed which measures physical strength in men and<br />

women. Because the job requirements for men and women are the same in<br />

principle, this results in there being relatively few women in physically<br />

demanding posts.<br />

C. One particular problem is that few women are prepared to enter into<br />

long-term contracts. This was one of the reasons for our making all<br />

conscript posts - in principle - accessible to women.<br />

For many this was a way of getting to know the RNLA. For some this has<br />

already led to a job with the regular personnel.<br />

d. Measures have been implemented which until recently were highly<br />

controversial: parental leave has been introduced (for both men and<br />

women), day-care centres have been set up, part-time work has been<br />

introduced for soldiers, and women now have the opportunity (after a<br />

certain time) to return to the RNLA in order to resume a military<br />

career interrupted by parental duties.<br />

Officers from the psychological and social service have played a<br />

significant role in the implementation of all these measures.<br />

I myself have been very closely involved, as I am chairman of the<br />

working group responsible for the preparations for the integration of<br />

women (and also that of ethnic groups).<br />

471


e. Furthermore, I have been c:ommander of t.he RNLA’s seiection centre for<br />

three years - I shall tell. you more a!)out. this an? ;he relevance it<br />

bears to this conference.<br />

As you are aware, psychological tests show. on average, differences ill<br />

scores between men and women.<br />

In 1985 the tests gave the following differences in percent.ages of men<br />

and women passing final selection:<br />

men :<br />

women :<br />

40%<br />

20%<br />

Further research revealed that these differences in percentages were<br />

brought about mainly by the original version of our practical technical<br />

ability test.<br />

A mere 25% of all women came above the cut.-off score, as opposed TV ‘5%<br />

of all men.<br />

f. This was one of the reasons behind our subjecting these test to a<br />

thorough item-bias study, which led to t.he test being drastically<br />

adapted.<br />

Women still score iower results than men in this test, but the differences<br />

are now much smaller: 50% of women and 75% of men now meet the<br />

demands set in these test.<br />

We have also modified the procedures.<br />

Compensation and more differentiation a,r>cording to position have how<br />

been introduced, and we have adapted our<br />

personality test.s and the interview.<br />

Without. going into too much detail, I can also tell you that we<br />

currently set the same demands 2nd the same tests for men and women,<br />

with the same cut-off scores,<br />

answers.<br />

based on the same number of correct<br />

Moreover, the numbers of men and women in percentages passing final<br />

selection are now almost equal, as . the following illustrates:<br />

men : 45%<br />

women : 40%<br />

g* This is the only way in which we can achieve our objective of -women<br />

comprising 10% of the army by 1993. You will<br />

understand that this is no simple task for an organisation in which t.he<br />

same demands are set. for both men and women, nor for one which is t.i! be<br />

reduced by 30% over the next f*w years.<br />

Reductions in personnel<br />

This brings me to the final point I wish to bring to your attention. Many<br />

armed forces will have to make considerable reduct.i.ons over the next few<br />

years; as I have aiready merit.ioned, this will amount to 30% for the RNL4.<br />

4 ‘7 2


The question now is how to approach this, and how to help in one’s capacity<br />

as a military psychologist.<br />

Allow me to give you some examples of our contribution in this matter:<br />

Exit interviews: determining the ways of promoting the voluntary outflow<br />

of personnel.<br />

Out-placement: assisting in the search for a job outside the army.<br />

Information: advising on the policy to be pursued and its psychological<br />

consequences for personnel.<br />

Individual in cases where enforced discharge is<br />

assistance8 unavoidable (etc. 1.<br />

This is perhaps a rather gloomy note on which to finish, but it nevertheless<br />

illustrates how valuable such a widely-deployable military psychological and<br />

social service can be.<br />

473


Acceptance of Change<br />

An Empirical Test of a Causal hlodel<br />

Edith Lynne Goldberg<br />

State University of New York, Albany<br />

John P. Sheposh<br />

Joyce Shettel-Neuber<br />

Navy Personnel Research and Development Center, San Diego. CA<br />

Abstract<br />

This study examined the effect of climate in combination with other factors on<br />

perceived value and acceptance of changes in three public sector (Department of<br />

Defense) organizations that had adopted new approaches to managing human<br />

resources. A conceptual model was proposed and tested to convey the interactive<br />

nature of the set of factors selected as important to acceptance of the changes.<br />

In general the hypothesized ,interrelationships were supported by the data. The<br />

assessment of the specific changes during the period of implementation was<br />

influenced by organizational contextual factors (CLIMATE). The assessment of the<br />

specific changes, in turn, affected perceived consequences of the changes which<br />

influenced the desire to retain the changes. This last factor, which could be<br />

construed as intentionality, is considered an important underpinning or precursor<br />

to the final stage of institutionalization. The combination of predictors in the<br />

model accounted for 56% of the variance. Theoretical and applied issues were<br />

discussed and future research suggested.<br />

In response to need or opportunity, organizations put into place planned changes that alter<br />

or replace existing procedures, products, processes, and/or policies. The implementation phase-<br />

-what happens after a decision has been made to adopt the change--is a critical period in the<br />

success of the change. The research that has been accumulated on implementation has reported<br />

numerous failures (Bardach, !977; Schultz & Slevin, 1975), therefore, research which could !ead<br />

to the identification, examination, and better understanding of factors important to the<br />

successful implementation of organizational change is needed.<br />

A wide range of factors could plausibly influence the implementation of a change in an<br />

organization. Certain factors have been identified by most experts on the subject as playing a<br />

significant role in the adoption and implementation of change (cf. Sheposh, Hulton, 22 Knudsen,<br />

1983). One of the major factors that has been cited as influencing implementation and<br />

institutionalization of change is the organizational climate of the adopting unit (Glaser, 19721.<br />

In general, climate has been regarded as a perception of the organization by its employees which<br />

is shaped by experiences within the organization. Climate is viewed as influencing the behavior<br />

of organizational members, distinguishing one organization from another, and enduring 01.e:<br />

time (Gordon & Cummins, 1979; James Sr Jones, 1979; Schneider & Snyder, 1975).<br />

This paper reports on research which examined the effect of climate in combination u-irh<br />

other factors on perceived value and acceptance of changes in three public sector (Departmecr<br />

of Defense) organizations that had adopted new approaches to managing human resources. *2<br />

conceptual model was proposed and tested to convey the interactive nature of the set of faciors<br />

that were selected as important to acceptance of the changes. The model. the variable<<br />

comprising the model, and the proposed causal linkages are presented in Figure 1.<br />

The opinions expressed in this paper are those of the authors, are not ~t‘fic~a!. .y. p. :J ;j: .; ‘: :<br />

necessarily reflect the vie\vs of the Navy Department.<br />

-m- .-.- . .<br />

-


Level<br />

c Consequences<br />

of Changes<br />

Fimre 1. Proposed model of acceptance of institutionalizalion of change.<br />

1 Acceptance of<br />

b Institutionalization<br />

of Change<br />

According to the model, LEVEL in the organization (i.e., first-line supervisors, managers)<br />

and CLIMATE represent exogenous variables. LEVEL was included because descrjptions of .<br />

organizational climate differ among hierarchical levels within an organization. Payne and<br />

Mansfield (1973), for example, reported that those individuals who were higher on the<br />

organizational hierarchy tended to perceive their organization as more democratic, friendly, and<br />

ready to innovate than those who were lower. As conveyed in the model, CLIMATE has a direct<br />

influence on specific aspects of the three changes that were being implemented. Organizational<br />

climate was expected to affect the extent to which specific changes produce benefits, because a<br />

change is more likely to succeed in an organization where the climate is open, accommodating<br />

to change, and in general positive. The combined effects of the specific changes in turn should<br />

significantly affect (increase or decrease) managers’ and supervisors’ ability to manage<br />

personnel-related matters in their work (CONSEQUENCES OF CHANGES). These perceived effects<br />

were expected to have a direct bearing on their willingness to institutionalize the particular set<br />

of changes that were being implemented (ACCEPTANCE OF INSTITUTIONALIZATION OF CHANGE).<br />

This index was included because previous research (Berman & McLaughlin, 1978) suggested that<br />

the question of institutionalization of a change is distinctly separate from that of<br />

implementation. Berman and McLaughlin concluded that initial adoption of a change does not<br />

ensure implementation nor does successful implementation necessarily ensure continuation of the<br />

change. It was hypothesized that in this study the perceived success of the changes during the<br />

implementation phase, as gauged by assessment of specific aspects of the changes (SPECIFIC<br />

CHANGES), and the perceived consequences (CONSEQUENCES OF CHANGES), which are to some<br />

extent determined by the climate of the organization will tend to produce broad based support,<br />

which would be instrumental in promoting the continuation of the changes (ACCEPTANCE OF<br />

INSTITUTIONALIZATION OF CHANGE). .<br />

Method and Procedures<br />

Oreanizations<br />

Three Department of Defense (DOD) organizations, which provide logistical support for the<br />

armed services, served as research sites. Their functions include storing, shipping, and issuing<br />

materials and monitoring contracts with private sector businesses. They are are staffed by civil<br />

service employees and a few military officers in top management positions.<br />

Subiects<br />

The data in this study were based on the questionnaire responses of a random sample of<br />

211 supervisors and managers from first-line level and above.<br />

Innovations<br />

As part of a 3-year experiment designed to improve human resource management, n<br />

package of three changes was proposed and implemented at each of the three sites. One change<br />

involved the Delegation of Classification Authority to line management, allowing those most<br />

familiar with positions under them to assign series and grades to jobs rather than having<br />

personnelists do so. The second change, Nonpunitive Discipline, was established to substitu:c<br />

letters of warning for reprimands and short suspensions. The initiative was intended to imprc:r:e<br />

475


supervisor-subordinate relations, make employees take responsibility for correcting problem<br />

behavior, and save money and productivity lost to suspensions. The third initiative, the<br />

Elimination of Mandatory Interviews, removed an agency requirement that all candidates for a<br />

job be interviewed and allowed appointing officials to interview some, all, or none of the<br />

candidates for a position after reviewing their written applications<br />

Materials<br />

A questionnaire, developed to measure respondents’ perceptions of climate and the specific<br />

changes. was administered one year after program implementation began. The first part of the<br />

instrument included questions regarding demographic characteristics of the respondents and<br />

perceptions of organizational climate. Organizational climate was adapted from several<br />

questionnaires (Gordon & Cummins, 1979; Siegel & Kaemmerer, 1978; Mowday & Steers, 1979;<br />

and Young, Riedel, & Sheposh, 1979). It consisted of 47 items which represented nine _<br />

organizational dimensions (e.g. organizational climate, management style, organizationsl<br />

effectiveness). Seven point scales were used for all dimensions except organizational<br />

effectiveness which was measured on a nine point scale.<br />

The second half of the survey assessed the specific. changes and related issues. Three<br />

aspects of the changes were addressed. First, a set of items using 7 point scales was developed<br />

to assess the specific initiatives. For example, the ease, efficiency, and fairness of the<br />

Elimination of Mandatory Interviews initiative was measured by three items employing 7-point<br />

response scales. Second, perceived consequences resulting from the specific changes (e.g.,<br />

augmented authority and increased ease in carrying out personnel actions) were measured with 7<br />

point scales. Third, the general acceptance of the initiatives, preference for these changes over<br />

the old system, and the extent to which respondents wanted the changes to continue were<br />

assessed with three items employing 5 point scales.<br />

Results<br />

The mean responses for the components comprising the model for first-line supervisors and<br />

managers and for the overall sample are presented in Table 1. In general managers gave a<br />

slightly more positive assessment of the organization’s climate, the individual changes, the<br />

perceived consequences, and the acceptance of the institutionalization of the changes.<br />

Significant differences between supervisors’ and managers’ ratings were obtained for<br />

ELIMINATION OF MANDATORY INTERVIEWS (Fl ]gg = 7.12, p K .Ol) and for ACCEPTANCE OF THE<br />

INSTITUTIONALIZATION OF THE CHANGES (Fl ]g; = 13.11 p


As shown in Table 1 the supervisors and managers assessed each of the specific changes<br />

favorably. For example, they agreed that the elimination Of mandatory interviews is easy to<br />

carry out, results in fair selection of candidates, and results in positions being rapidly filled.<br />

Similarly, the supervisors and managers perceived benefits resulting from the combined changes<br />

(e.g., perceived increases in their authority to influence classification decisions, the overall<br />

productivity of the work teams). They did not perceive differences with respect to meeting job<br />

responsibilities or filling positions as a result of the inception of these changes. Finally,<br />

concerning the institutionalization of the changes, managers and supervisors are positive about<br />

these changes, prefer the new system over the old, and would like to see the changes continued<br />

in their work setting.<br />

I<br />

B1.12’<br />

I<br />

Blmlmtlon ot<br />

I<br />

M8nd8lofy hl-<br />

,<br />

I<br />

- - - -.--,-------------------l9*.11*<br />

(rJq<br />

Fieure 2. Model of acceptance based on path analyses.<br />

A path analysis was applied to determine the correspondence between the data and the<br />

proposed model as described in Figure 1. The results of the path analysis are presented in<br />

Figure 2 and the correlation matrix underlying the analysis are presented in Table 2. The<br />

ordering of the variables and their interrelationships as presented in Figure 2 generally<br />

correspond to the structure of the proposed model. As hypothesized the LEVEL variable is most<br />

strongly related to CLIMATE which in turn directly influences the three changes. Assessment of<br />

the three changes has a significant relationship on CONSEQUENCES BENEFITS but not on<br />

CONSEQUENCES EFFORT. The differentiation of consequences into two types was made on the<br />

basis of a factor analysis which generated two independent factors. Both sets of consequences<br />

are significantly related to ACCEPTANCE OF THE INSTITUTIONALIZATION OF THE CHANGES with the<br />

CONSEQUENCES BENEFITS clearly the strongest predictor. The model accounted for a good<br />

amount of the total variance (R2 =.56). In addition to the absence of a significant relationship<br />

between the specific changes and the CONSEQUENCES EFFORT variable, the ordering of effects<br />

that were obtained for ELIMINATION OF MANDATORY INTERVIEWS departed from the proposed<br />

model. The direct relationship between this change and acceptance was stronger than the<br />

relationship of this variable to consequences.<br />

477<br />

I


I_- __-- ____-._ -.- -- -.. .-.-----. -..-- --<br />

.<br />

1.<br />

2.<br />

3.<br />

4.<br />

5.<br />

6.<br />

7.<br />

8.<br />

-8 .r:: .‘. a\ p-;:::<br />

.; ,. .d, :’ a’<br />

P<br />

Table 2 4 _<br />

Zero-Order Correlations for Model Components<br />

*. .’<br />

. ;.<br />

L<br />

*.<br />

I.<br />

$$f..<br />

I.<br />

: :.<br />

.A, :.-<br />

Level<br />

Climate<br />

Delegation of Classification Authority<br />

Letters of Warning<br />

Elimination of Mandatory Interviews<br />

Consequences (Effort)<br />

Consequences (Benefits)<br />

Acceptance for Innovation<br />

1<br />

.20*<br />

.oo<br />

.08<br />

.14*<br />

.oo<br />

.09<br />

.20*<br />

2<br />

.28*<br />

.32*<br />

.15*<br />

-.o 1<br />

.30*<br />

.2”* 3<br />

3<br />

.31*<br />

.29*<br />

.oo<br />

.50*<br />

.44*<br />

4 .I 5<br />

4<br />

.27*.+<br />

.Ol ;:,:<br />

.37*- ’ .32*<br />

.44* .55*<br />

,,. 6<br />

-08<br />

.1.1<br />

.20*<br />

.a.%<br />

7 , ,_ i;; J.,;, g &<br />

. $ .y 3;.<br />

, ‘;<br />

Regression anaiyses were replicated by employing LISREL (Jorestog & Sordom, 1984):,. :<br />

This approach uses equations with more explicit specifications and simultaneous ,estim$es;.,$ , .hypothesized<br />

underlying relationships and unexplained variance. LISREL provides a more, 2<br />

holistic approach in comparison to separate regression anaiyses (Bagozzii& Phillips, ;1982). ‘and’ ., “yi Y’ ;,.,<br />

served to test the goodness-of-fit of the model in this study. The variablg; 'LEVEL, did not meet:: .y, ,_<br />

the specifications of the model and could not be entered as a component.‘- The mo:!el yielded h ~,~;;t,‘?’<br />

goodness-of-fit (GFI) measure of .94 with an adjusted GFI (AGFI) of $3, and a root mean .y*,, ,?t,<br />

square residual (RMSR) equal to .15. This model appears to be a very reasonable explanation of;;the<br />

relationships between these variables and their ability to predict acceptance for innovation. : .‘I .i<br />

WhiIe the data were generally consistent with the model there were some discrepancies. ’ “’<br />

Contrary to expectations, all these changes were found to exert a direct effect on,KCEPTANcE<br />

as well as having a direct effect on CONSEQUENCES. It appears that the changes and ,<br />

CONSEQUENCES are neither empirically distinct nor do they seem to function in an exactly ‘,<br />

similar fashion. The other not readily explainable departure from the l3roposed model is the<br />

lack of significant relationships between CONSEQUENCES (EFFORT) and the, factors hypothesized<br />

as the determinants of this factor.<br />

Conclusions<br />

; :<br />

The present research proposed and tested a model that inc&porated ,‘.cornponents~.;:~~* ‘(<br />

hypothesized as relevant to the assessment of the status of a set of changes being impleniente@:. +.3,+~ w:~-<br />

In general the hypothesized interrelationships were supported by the dat& The assessm&t<br />

.<br />

;f<br />

. . . . ( .<br />

yi.,? * .I L<br />

the specific changes during the period of implementation was influenced by organizational ‘,’ ;?<br />

contextual factors (CLIMATE). The assessment of the specific changps, in turn, affected . .,<br />

perceived consequences of the changes which influenced the desire to retain the changes. This<br />

last factor could be construed as intentionality, an important underpinning of or precursor to<br />

the final stage of institutionalization.<br />

56% of the variance.<br />

The combination of predictors in the model accounted for<br />

Several conclusions are evident from results based on using regression and structural’ ,,:e<br />

equations. Consistent with past research (Glaser, 1973), hierarchical level and organizational ,I’ “.‘;.<br />

climate were found to be important factors for predicting acceptance of change, but, as the _<br />

present results suggest, they operate as indirect rather than direct predictors. The pattern of.<br />

results, thus, suggest that simple bivariate correlations cannot adequately capture the CLIMATE -<br />

,<br />

ACCEPTACE OF CHANGE relationship. In addition the present model suggests that a combination 4a<br />

of general information about the organization (e.g. LEVEL, CLIMATE) and more specifi?<br />

information about outcomes brought about by the changes are necessary to better understand..<br />

and assess the status of the changes under study and to better predict their future acceptance.<br />

*<br />

p < .05, N = 211<br />

_


-----__<br />

The results have several implications from an applied perspective. First, the use of<br />

measures assessing aspects of the organizational context and its relationship to the perceived<br />

value of the changes underscores the necessity to consider not only the specific features of the<br />

changes but also how the organization operates and functions in the cultivation and promotion<br />

of the changes. Second, the measurement of the changes in terms of their ability to produce<br />

certain expected outcomes is useful in determining the extent to which the changes are<br />

operating as intended. This information can be helpful particularly in the formative stage of an<br />

evaluation when providing feedback to those implementing the changes. Third, as the<br />

implementation process continues and evolves over time the predictive ability of the model can<br />

be determined. To the extent this model successfully predicts the status of the changes it can<br />

then be used during implementation of other changes that are introduced into an organization.<br />

In summary the proposed model, comprised of variables selected on the basis of theoretical<br />

considerations as well as the nature of the changes that were introduced and implem,ented, has .<br />

shown promise as a framework for predicting and understanding the implementation and<br />

acceptance of change in an organizational setting. There are recognized limitations in this<br />

study. It is clear additional research is required. Continued assessment of the changes over<br />

time is needed to ascertain the predictive effectiveness of the model. Additional testing of the<br />

model on other types of changes and in other organizations is called for in order to determine<br />

its effectiveness.<br />

REFERENCES<br />

Bagozzi, R.P. & Phillips, L.W. (1982). Representing and testing organizational theories: A<br />

holistic construct. Administrative Science Ouarterly, =,459-489.<br />

Bardach, E. (1977). The implementation game. Cambridge, MA: MIT Press.<br />

Berman, P., & McLaughlin, M.W. (1978, May). Federal programs sunporting educational<br />

change. Vol VIII: Implementinn and sustaining innovations. Santa Monica, CA: Rand.<br />

Glaser, E.M. (1973). Knowledge transfer and institutional change. Professional Psycholoey, 4.<br />

434-444.<br />

Gordon, G.G., & Cummins, W. (1979). Managing management climate. Lexington&l,%<br />

Lexington Books.<br />

James, L.R., & Jones, A.P. (1979, April). Perceived iob characteristics and iob satisfaction: An<br />

examination of reciprocal causation (Report No.79-5). Fort Worth, Texas: Texas<br />

Christian University, Institute of Behavioral Research.<br />

Joreskog, K.G., & Sorbom, D. (1984). LTSREL VI: Analysis of linear structural relationshius b\.<br />

the method of maximum likelihood. Chicago: National Educational Resources.<br />

Mowday, R.T., Steers, R.M., & Porter, L.W. (1979). The measurement of organizational<br />

commitment. Journal of Vocational Behavior, 14, 224-247.<br />

Payne, R.L., & Mansfield, R. (1973). Relationship of perceptions of organizational climate to<br />

organizational structure, context, and hierarchical position. Administrative Science<br />

Ouarterlv. 18. 515-526.<br />

Schneider, B., & Synder, R.A. (1975). Some relationships between job satisfaction and<br />

organizational climate. Journal of Aoolied PsvcholoPv, a(3), 318-328.<br />

Schultz, R.L., & Slevin, D.P. (1976). Implementation and organizational validity: An empirical<br />

investigation. In R.H. Kilman, L.R. Pondy, & D.P. Slevin (Eds.), Management of<br />

orpanization design. New York: Elsevier North-Holland.<br />

Siegel, S.M., & Kaemmerer, W.F. (1978). Measuring the perceived support for innovation in<br />

organizations. Journal of ADDlied Psychology, a(5), 553-562.<br />

Sheposh, J.P., Hulton, V.N., & Knudsen, G.A. (1983, February). Imolementation of olanned<br />

change: A review of major issues. (NPRDC TR 83-7). San Diego, CA: Navy Personnel<br />

Research and Development Center.<br />

Young, LE., Riedel, J.A., & Sheposh, J.P. (1979). Relationship between Dercwtions of role<br />

stress and individual, organizational. and environmental variables _-- (NPRDC TR 80-8‘:.<br />

San Diego, CA: Navy Personnel Research and Development Center.<br />

479


TWEEDDALE, J. W. (Chair), Chief of Naval Education and Training,<br />

Pensacola, FL.<br />

Annually, approximately 40,000 prospective college students request<br />

information on the NROTC scholarship program. About 12,000<br />

individuals apply and become finalists for NROTC scholarships.<br />

Four-year scholarships are ultimately awarded to approximately 1,500<br />

of the applicants. The scholarship pays for tuition, textbooks,<br />

instructional fees, and summer training periods, as well as provides<br />

the selectee with $100 per month (for a maximum of 40 months).<br />

Selectees may become a member of any of the 66 NROTC units that<br />

service over 120 colleges and universities located nationwide.<br />

The presentations in this symposium describe the procedures used to<br />

select NROTC scholarship recipients. CDR Bob Hawkins of the Na+al<br />

Education and Training Program Management Support Activity will<br />

present an overview of the NROTC selection process. Jack Edwards of<br />

Navy Personnel Research and Development Center will present a paper<br />

that was coauthored with Regina Burch (Colorado State University) and<br />

Norman Abrahams (Personnel Decisions Research Institutes, Inc.). He<br />

will review the steps used to revise the NROTC selection composite.<br />

Third, Wally Borman from the University of South Florida will discuss<br />

a recently developed, behaviorally anchored selection interview and a<br />

newly constructed biodata instrument. Finally, I will highlight the<br />

current and future research objectives for the NROTC scholarship<br />

selection system.<br />

TWEEDDALE, J. W.,<br />

Pensacola, FL<br />

Chief of Naval Education and Training,<br />

Improved procedures for the selection of future officers is<br />

complicated by the longitudinal nature of the research. For<br />

example, if the criterion is whether an individual will remain<br />

following completion of obligated duty, it may take 8 to 10<br />

years for the criterion data to become mature. Also, the<br />

divergent criteria (college grade point average, grade point<br />

average in naval science courses, and military performance<br />

while in NROTC and later in the Navy) used to assess the<br />

accuracy of the NROTC scholarship selection system may present<br />

problems.<br />

The need to monitor the validity of the current predictors and<br />

develop new predictor and criterion measures are but two of the<br />

research needs currently confronting NROTC researchers. In<br />

addition to capturing readily quantifiable information efforts<br />

have been put forth to capture various other characteristics of<br />

"the whole person." Now, researchers, CNET staff, and<br />

Professors of Naval Science are examining ways to operationalize,<br />

measure, and validate those characteristics. The<br />

whole-person model will continue to guide NROTC scholarship<br />

selection research in this time of change for the Navy.<br />

480


a-<br />

GATHERING AND USING NAVAL RESERVE OFFICERS TRAINING CORPS<br />

SCHOLARSHIP INFORMATION<br />

Robert B. Hawkins, Commander, U.S. Navy<br />

Naval Education and Training<br />

Program Management Support Activity<br />

Naval Air Station, Pensacola, FL<br />

Introduction<br />

The responsibility for identifying potential Naval Reserve<br />

Officers Training Corps (NROTC) Scholarship applicants,<br />

processing applications, identifying scholarship winners, and<br />

then placing those selected at one of the 66 host or the more<br />

than 100 associated crosstown affiliated NROTC universities is<br />

divided between two separate Navy commands: the Commander,<br />

Navy Recruiting Command (CNRC) and the Chief of Naval Education<br />

and Training (CNET).<br />

Until the 1986/87 NROTC scholarship year, CNRC identified<br />

applicants, processed applications, and selected NROTC<br />

Scholarship winners. Scholarship winners were identified<br />

during two week-long selection board sessions, an early<br />

selection board held in November, and a second board in<br />

February. CNET then took the administrative action required to<br />

determine final program eligibility (physical qualification)<br />

and provided the authorization for a selectee to attend an<br />

NROTC university under scholarship.<br />

Responsibility for selecting NROTC Scholarship recipients was<br />

transferred from CNRC to CNET after the November 1986 early<br />

selection board. CNET then instituted weekly selection boards<br />

to replace the standard two-selection-board process. Selection<br />

board membership remained essentially the same, with selection<br />

board members drawn from NROTC units (commanding officers) and<br />

the NROTC staff, However, unlike the two-selection-board<br />

system where the same board members evaluated all applicants<br />

during a single session, a weekly selection board process<br />

required the use of different selection board members for each<br />

selection board. Thus, concerns about scoring consistency and<br />

the equity of evaluation had to be addressed.<br />

Application Solicitation<br />

CNRC begins the applicant identification process in March each<br />

year. The primary target market is the high school junior<br />

(rising senior) class. Potential applicants are identified<br />

through a variety of means, but primarily by screening the<br />

Preliminary Scholastic Aptitude Test (PSAT) and Armed Services<br />

Vocational Aptitude Battery (ASVAB) high scorers lists.<br />

Additionally, numerous high school and college fair presentations<br />

are made to generate interest among the college bound<br />

high school student population.<br />

481


____ _.... ___----- .._._ - ___.I.-.__ _.. ~_<br />

A student applying for an NROTC Scholarship completes an<br />

initial applicant questionnaire which establishes his or her<br />

interest. The data supplied on the applicant questionnaire is<br />

used to create a file for the student in the NROTC data base.<br />

The student must then take either the Scholastic Aptitude Test<br />

(SAT) or American College Test (ACT) and request that his or<br />

her scores be released to the NROTC Scholarship Program.<br />

ACT and SAT test data for those students who authorize score<br />

release to the NROTC Program are periodically received by CNRC<br />

and matched with the NROTC interested student file. Those<br />

meeting minimum eligibility scores (presently 450 verbal and<br />

500 math for SAT; 21 English and 23 math for ACT) are invited<br />

to complete a scholarship application. Completed applications<br />

are then compiled by CNRC, forwarded to CNET, and presented to<br />

the selection board for evaluation.<br />

Application Evaluation<br />

__-...--------<br />

In the two-selection-board system of evaluation used by CNRC,<br />

applications were grouped by state, and three- or four-member<br />

selection committees were established to evaluate applicants in<br />

each state group. The number to be selected from a particular<br />

state was provided to the selection committee, and applicants<br />

were selected to meet that target. The early selection board<br />

(November) considered all applications received prior to the<br />

board convening date. The second selection board (February)<br />

evaluated all applications received by the scholarship<br />

application submission deadline, including those of individuals<br />

who were evaluated but not selected by the early selection<br />

board.<br />

In the 1986/87 scholarship year, 50 percent of those selected<br />

to receive an NROTC scholarship were identified by the early<br />

(November) selection board. The balance of those selected were<br />

identified through a series of weekly selection boards which<br />

met from January through March of 1987.<br />

To ensure consistency of scoring, application evaluation<br />

procedures used by each of the weekly selection boards were<br />

similar to those used during the two-selection-board process.<br />

Under'these procedures, selection board members were given very<br />

broad guidance and complete discretion in the awarding of<br />

evaluation points. Selection boards each had up to 100 points<br />

available to award to each applicant. The points awarded by<br />

the selection board were then added to a previously calculated<br />

base score called the applicant Quality Index (QI). The<br />

applicant QI is an optimally weighted selection composite<br />

developed by,the Navy Personnel Research and Development Center<br />

to predict student academic and military performance cr'iteria.<br />

The sum of the selection board score and the Quality Index<br />

determined the applicant's rank ordered position in the group<br />

of all applicants evaluated during the weekly selection board<br />

process.<br />

482


necessary to respond to the results of that review. The<br />

following system, with minor modifications, has been used in<br />

NROTC selection process, since.<br />

Current Selection Evaluation System<br />

The current NROTC Scholarship selection board evaluation system<br />

uses the Quality Index as a Base Score for each applicant. The<br />

Quality Index accounts for approximately 66 percent of the<br />

total applicant selection score. The selection board provides<br />

the remaining 34 percent.<br />

The selection board is provided with an evaluation score sheet<br />

that defines specific areas for selection board scoring<br />

consideration. The score sheet is divided into three broad<br />

areas of evaluation: scholarship, military bearing, and<br />

personal attributes. Contained within each area are specific<br />

scoring categories with each category assigned a maximum point<br />

value. Approximately 40 percent of the scoring categories<br />

include a recommended selection board score previously<br />

calculated by computer using established algorithms and raw<br />

data derived through the optical scan process (dot counting).<br />

Selection board members evaluate the application and assign<br />

points for each category. They may adjust the computerrecommended<br />

scores, if desired. The number of points assigned<br />

by the selection board member, including those recommended by<br />

computer, is then added to the Quality Index to determine the<br />

applicant's final standing in the rank ordered list of all<br />

applicants.<br />

The categories used by the selection board to evaluate a<br />

scholarship applicant are:<br />

<strong>Military</strong> Potential:<br />

Is applicant a military dependent? Score recommended<br />

Athletic participation Score recommended<br />

JROTWCAP participation Score recommended<br />

Applicant physical fitness<br />

Motivation for the NROTC Programs<br />

Interviewing officer evaluation<br />

Personal Factors:<br />

Leadership positions held<br />

Score recommended<br />

Involvement in non-school activity<br />

Did applicant experience adversity?<br />

Strength of character<br />

Exceptional achievement<br />

Potential for graduating with a<br />

tech degree Score recommended


i<br />

I<br />

I<br />

Scholarship:<br />

Quality of learning environment<br />

Transcript evaluation for math/<br />

science performance<br />

Intellectual motivation<br />

Teacher evaluations<br />

Evaluation of applicant's statement<br />

Course difficulty<br />

Score recommended<br />

Score recommended<br />

Score recommended<br />

Scholarships are offered based upon the rank order of all<br />

applicants. Adjustments may be necessary to meet specifically<br />

assigned state scholarship allocation targets, active duty,<br />

female, and minority targets, or Navy physical qualifications.<br />

Summarv<br />

The current selection board process has worked extremely well.<br />

The structure built into the evaluation system provides the<br />

consistency of applicant evaluation desired in a 6-month<br />

selection board cycle with varied selection board membership.<br />

The cost of that consistency, a limitation in selection board<br />

flexibility, appears to have had a positive effect as well.<br />

Selection board members feel comfortable working within the<br />

more structured system and selection or non-selection decisions<br />

are much more defensible. More importantly, several measures<br />

of incoming freshman class performance indicate that the<br />

process improved the selection board's ability to identify<br />

those most likely to perform well once enrolled in the NROTC<br />

Program. The performance of the scholarship students entering<br />

the program since the revised selection procedures were fully<br />

implemented has improved, with the average freshman year grade<br />

point average increasing from 2.89 in 1988 to 3.0 this past<br />

year. Freshman attrition has also decreased dramatically.<br />

Twenty-two percent of the freshman class attrited from the<br />

program during the 1988 academic year. Freshman attrition for<br />

the 1990 academic year dropped to 14 percent. The selection<br />

board average applicant score dropped to less than 50 percent<br />

of the total points available for awarding. This ensures that<br />

truly exceptional applicants can be awarded enough points for<br />

scholarship selection.<br />

References<br />

Mattson, J.D., Neumann, I., & Abrahams, N.M. (1986).<br />

Development of a revised composite for NROTC selection<br />

(NPRDC TN 87-7). San Diego: Navy Personnel Research and<br />

Development Center.<br />

Owens-Kurtz, C-K., Borman, W.C., Gialluca, K-A., Abrahams,<br />

N.M., & Mattson, J.D. (1989). Refinement of the Navy<br />

Reserve Officer Traininq Corps (NROTC) scholarship<br />

selection composite (NPRDC Tech. Note TN 90-l). San<br />

Diego: Navy Personnel Research and Development Center.<br />

485


Validation of the Naval Reserve Officers Training Corps Quality Index’<br />

Jack E. Edwards Regina L. Burch Norman M. Abrahams<br />

Navy Personnel Research Colorado State University Personnel Decisions<br />

and Development Center Ft. Collins, CO Research Institute, Inc.<br />

San Diego, CA Minneapolis, MN<br />

Using data from Naval Reserve Officers Training Corps (NROTC) entering classes of 1979 and 1980,<br />

Mattson, Neumann, and Abrabams (1986) optimally weighted six academic and personal factors: Scholastic<br />

Aptitude Test-Verbal (SATV), Scholastic Aptitude Test-Math (SATM), high school rating (HSR), an interviewer’s<br />

rating (INTER), the Strong-Campbell Interest Inventory career-tenure scale (SCII), and the Background<br />

Questionnaire career-tenure scale (BQ), to develop a selection composite for predicting three criteria. Recent<br />

Navy policy directed toward increasing the proportion of college graduates with technical degrees has made it<br />

necessary to develop and validate a new selection system that adds a new criterion, choice of technical major<br />

(TECH) to the three previously investigated criteria: college grade point average (GPA), naval aptitude grades<br />

(APT), and naval science grades (NSG).<br />

Obiective<br />

The objective of this paper is to review the development and validation of the new NROTC schokarship<br />

selection composite, the 1989 Quality Index (QI-89). Three steps were included: (a) developing the optimally<br />

weighted QI-89; (b) predicting a new criterion (TECH); and (c) constructing an expectancy table/chart to predict<br />

TECH using a new predictor, engineering-and-science-interest score (ES).<br />

Population<br />

Approach<br />

The population contained 6,609 individuals who had entered NROTC from 1983 to 1987 and completed at<br />

least one semester/quarter of the program. Men comprised 96.5% of the population, and 92.6% of the candidates<br />

were nonminorities. Each person had received a four-year national competition scholarship; had complctc data<br />

on all seven predictors; had valid scores for GPA, APT, and NSG; were Navy (versus Marine) option; and had<br />

a selection code of principal selectee, early select, alternate best, or altemab middle.<br />

Predictors<br />

Six predictors were used to develop the selection composite. A seventh predictor (ES) was used to dcvclop<br />

an expectancy chart for predicting TECH.<br />

SATV and SATM or American College Test (ACT) equivalents. These scores represent the verbal and<br />

quantitative aptitudes of an individual as measured by a national competitive testing program designed for college<br />

admissions and scholarship awards. If an individual took the standardized test(s) on multiple occasions, the<br />

highest score was used in the analyses. ACT scores were translated to equivalent SATV and SATM scores using<br />

a recently developed conversion table (Owens-Kurtz, Borman, Gialluca, Abraham& & Mattson, 1989).<br />

HSR. This measure is based on high school rank in class. It was computed with a two-step procedure.<br />

First,arcentile rank was determined by multiplying high school rank by 2, subtracting 1 from that product,<br />

and then dividing the difference by the product of class size times 2. Second, each resulting percentile rank was<br />

converted to an equivalent HSR via tabled values. This second step lessened the effect of the negatively skcwcd<br />

distribution of percentile ranks. HSR values can range from 0 to 100 in increments of 10.<br />

INTER. During a 15-minute interview, an officer rated an applicant on factors important to a career as a<br />

1 This research was supported by the Office of Naval Technology, Program Elcmcnt 0602233N. The<br />

opinions expressed in this paper are those of the authors, ‘arc not official, and do not ncccssarily rcllcct the views<br />

of the Navy Department. This paper was presented in Novcmbcr 1990 at the annual meeting of the <strong>Military</strong><br />

<strong>Testing</strong> <strong>Association</strong> at Orange Beach, AL as part of J. W. Twecddalc’s (Chair) symposium, The Naval RCSC~VC<br />

Officers Training Corns (NROTC) Scholarship Selection System.<br />

486


---<br />

-_ ------__ -_-._-_--.-_.--- -__-.___--_<br />

naval officer (e.g.. poise and the officer’s willingness to have the individual serve under his/her command). Each<br />

applicant was assigned an overall rating of very high (1) to very poor (5). For consistency, this scale was rcvcrse<br />

scored.<br />

SCII. This scale consists of 76 item-responses from the Strong-Campbell Interest Inventory that predict<br />

officeztcntion for at lcast one year beyond the minimum obligated service (@cumann & Abrahams, 19781).<br />

The authors rcportcd a biscrial correlation of .I9 bc(ween the XII and cxtcndcd scrvicc. Scores can range from<br />

62 to 138.<br />

B(J. The career tenure scale, developed in 1981, is based on 14 biodata and personality items from<br />

Rimland’s (1957) Background Questionnaire. Neumann (personal communication, 1989) rcportcd a biscrial<br />

correlation of .I2 between the BQ and NROTC attrition. Scores can range from 93 to 107.<br />

ES. Engineering and science interests are idcntificd through 132 item-responses from the Strong-Campbell<br />

IntcreyInvcnlory (Neumann & Abrahams, 1978b). The authors reported biserial correlations of 56 and 58<br />

between ES and choice of final major for two cross-validation samples. Scores on this scale can range from 31<br />

t o 163,<br />

Criteria<br />

Four pcrformancc crilcria were used individually or in composites. When the four single-crilcrion regression<br />

equations wcrc combined into a composile, the following weights were assigned to the criteria: 40% for GPA,<br />

30% for APT, 20% for NSG, and 10% for TECH. Scores on GPA, APT, and NSG were standardized (using<br />

x-scores) within each host or cross-enrollment school. For individuals who attrited prior to the end of the first<br />

academic year, scores were cumulated to the time the individual left the NROTC program.<br />

First-Year GPA. This mcaswe is the grade-point avcragc obtained from all college courses that were lakcn<br />

during the first academic year.<br />

First-Year APT. APT is the first-academic-year, grade-point average in nonacademic military aspects of the<br />

NROTC program. An individual is assigned a grade of 0 to 4.00 by NROTC instructors on each of<br />

approximately 20 pcrformancc aspects and personal traits (e.g., goal setting and military bearing). APT is<br />

primarily used to dctcrminc how well an individual is adapting to the Navy and NROTC.<br />

First-Year NSG. This measure is the grade-point avcragc for naval science courses taken during the first<br />

academic year. These courses are Navy-relevant academic classes that include subjects such as navigation and<br />

seamanship. Students must take eight such courses; most students take one course each semester.<br />

Final TECH. majors wcrc categorized as either non-technical (1) or technical (2) using categories that wcrc<br />

obtained from the Chief of Naval Education and Training (CNET). Individuals with valid scores for TECH<br />

represented a subset of the larger sample. TECH was considcrcd valid if the candidate had cntcred collcgc in<br />

(a) 1983, 1984, or 1985 or (b) 1986 and had complctcd at ieast one scmcstcr/quarter of hi&r junior year.<br />

TECH was included as an additional criterion in an attempt to maximize the number of scholarships awxdcd IO<br />

applicants who would eventually choose a technical college major.<br />

Procedure<br />

Development and cross-validation samples. The 5,957 people entering NROTC between 1983 and 1986 wc.rc<br />

randomly assigned to cithcr a devclopmcnl or cross-validation sample @ = 3,652 and B = 2,305, respectively).<br />

A third sample, 652 individuals who cntercd NROTC during 1987, was used as a second cross-validation sample<br />

10 cnsurc that wcighti rcmaincd stable for the most rcccnt year for which crilcrion data wcrc avnilablc.<br />

Dcvcloping and cross-validating optimally weighted composites. Validity coefficients corrected for rang<br />

restriction wcrc used in multiple regression analysts to dcvclop optimally wcightcd selection composites for<br />

picdicting each of the four individual criteria. Although this procedure results in four sepamtc composite scores,<br />

applicants must ultimately be rank-ordcrcd on a single metric in order to make sclcction decisions. To obtain<br />

such an overall composite, the single-criterion composites were combined into the QI-89 in order to predict the’<br />

four single crilcria simultaneously. Weights were dcrivcd for these overall composites by combining predictor<br />

weights obtained for the single-criterion rcgrcssion equations.<br />

487


--__- _.~.<br />

Each of the composites was then cross-validated on both hold-out samples. The composites were evaluated<br />

for their ability to predict GPA, APT, NSG, and TECH in the 1983-1986 sample, and their ability to predict<br />

GPA, APT, and NSG in the 1987 sample.<br />

Determining effective weights. To assess the percentage of weight that each predictor received in the<br />

selection composites, effective weights scaled to 100% were computed. To compute the effective weights, the<br />

unstandardized b weights for each predictor within a composite were first multiplied by the corresponding standard<br />

deviation for that predictor. The products of b times SD were then summed across all the predictors included<br />

in the composite. The & weight for each prec&tor wazhen divided by that sum and multiplied by 100.<br />

Constructing the exnectancv table. The expectancy table for using the ES scale was developed by first rankordering<br />

the scores of midshipmen. Then, the distribution was divided as equally as possible into five groups.<br />

For each fifth, ranging from high to low, the percentage of ES majors was computed.<br />

Develonmenl Sample<br />

Results and Discussion<br />

Table 1 shows the means and standard deviations for the predictors and criteria, and the correlations for<br />

those two sets of variables. This information is provided for both the entire devclopmcnt sample and the<br />

development subsample that had valid scores on the TECH criterion. The validities were corrected for restriction<br />

in range prior to performing the regressions. Validities increased approximately .02 to .03 after corrections.<br />

The SATM means indicate that the average NROTC scholarship student scored at approximately the 89th<br />

percentile in mathematics aptitude. The SATV and HSR means are also above average. For SATV, the avcragc<br />

NROTC scholarship student outperforms approximately 87 percent of the college-bound seniors taking the test.<br />

Finally, the average NROTC scholarship student had an HSR of 73.55. That HSR value indicates that the avcragc<br />

NROTC scholarship recipient graduated in the top 10% of his/her high school class.<br />

There were negligible differences between the predictor and criterion means and standard deviations for the<br />

full development sample and its subsample. The intercorrelations among GPA, APT, and NSG were slightly<br />

lower for the dcvclopment subsample than for the full development sample. The three criteria were moderately<br />

correlated, with NSG and GPA being the most highly rclatcd. This result would be expected because these two<br />

criteria measure academic factors; furthermore, NSG is computed from a subset of the courses included in GPA.<br />

These relationships are also consistent with Mattson et al’s findings (1986). In that study, intercorrelations varied<br />

from .40 to .54, and GPA and NSG were the most highly intercorrelated of the three criteria. GPA showed the<br />

highest relationship with TECH, a criter!on not included in earlier composites. The correlations between the three<br />

criteria and TECH were, however, much smaller than the intcrcorrclations among GPA, APT, and NSG.<br />

The predictor-criterion correlations varied little in magnitude between the full development sample and its<br />

subsample. For both groups, HSR was the variable most highly correlated with GPA and APT. SATV and HSR<br />

showed the strongest relationships with NSG. Although SATM and SATV showed strong relationships with GPA<br />

and NSG, respectively, they showed virtually no relationship with APT. The interview rating, however, was<br />

related to APT. These latter two sets of findings are consistent with the observation that. GPA and NSG measure<br />

the academic performance of NROTC participants while APT mcasurcs military characteristics of the future<br />

officers. ES was the predictor most highly correlated with TECH. This outcome was cxpectcd because the ES<br />

scale was specifically developed to predict final major. Finally, SATM also had a strong association with TECH.<br />

Cross-Validation<br />

-.. ,-.. ...__.l__r ,__.<br />

Predictor scores were computed for each of the five composites (i.c., the four single-criterion composites<br />

and the QI-89) using data from the hold-out samples. These scores wcrc then correlated with each criterion.<br />

Single-criterion composites. Table 2 shows the cross-validity coefficients that were obtained for the four<br />

single-criterion composites. The cross-validities for the single-criterion composites were provided principally to<br />

show the upper limit of prediction for a given criterion since each composite should predict its own criterion<br />

better than any of the other composites. To use the table, the criterion of interest is located in a right-hand<br />

column, and the predictor composite is located in the left-hand column. The cross-validity for that predictorcriterion<br />

combination is found at the intersection of the corresponding column and row. For example, the .I22<br />

shown in the first row, second column indicates the cross-validity estimate that was obtained when weights that<br />

488


I<br />

were dcrivcd lo optimally prcdicl CPA (for tbc devc.lopment sample) were used LO predict APT in the 1983-<br />

1986 hold-out sample.<br />

Table 1<br />

Descriptive Statistics for the Full Development Sample and the Development Subsample<br />

Variable Mean<br />

-___<br />

Predictor<br />

SATV,<br />

SATV,<br />

558.5 1 76.78 .124<br />

560.08 76.98 .lOl<br />

SATM, 642.91 64.35 .I87<br />

SATM, 642.85 63.99 .I83<br />

HSR, 73.55 16.81 .272<br />

HSR, 74.45 16.34 280<br />

INTER, 4.82 .48 .035<br />

INTER, 4.83 .46 .030<br />

SCII,<br />

SCIL<br />

105.36 5.96 -.076<br />

105.49 5.88 -.069<br />

BQ, 100.97 2.32 .007<br />

BQ, 101.09 2.30 -.006<br />

ES, 110.19 13.55 .013<br />

ES, 1 IO.62 13.28 .023<br />

Criterion<br />

CPA,<br />

GPA,<br />

49.88 9.66 1 .ooo<br />

51.72 8.16 1 .ooo<br />

A=, 49.71 9.77 .425<br />

APT, 52.15 8.43 .363<br />

NSG, 49.80 9.77 .562<br />

NSG, 51.66 8.57 .546<br />

--e-s-<br />

Correlations with Criterion<br />

CPA APT NSG TECH<br />

-. --.-- .-.- ____<br />

.027<br />

.013<br />

.036<br />

.047<br />

.132<br />

.I05<br />

.093<br />

.Ohl<br />

-.OlO<br />

-.008<br />

.047<br />

.OlO<br />

.022<br />

.033<br />

1 .OOO<br />

1 .ooo<br />

.419<br />

.363<br />

-lJ=-b<br />

TECH,<br />

--<br />

1.59<br />

--<br />

.49<br />

---<br />

.243<br />

_--<br />

.104<br />

- -<br />

f as a subscript dcnotcs Lhc full development sample @ = 3,652).<br />

r as a subscript dcnotcs the reduced dcvclopmcnt sub.sample @ = 2,077).<br />

.192<br />

-193<br />

.OY2<br />

A?54<br />

.171<br />

.I56<br />

419<br />

.016<br />

-.021<br />

-.o 14<br />

--_<br />

-.OSl<br />

---<br />

.221<br />

---<br />

.093<br />

---<br />

-.oos<br />

-.v<br />

.06h<br />

.067 -me<br />

.052 -330<br />

.083<br />

AI94<br />

1 .ooo<br />

1 .oOo<br />

--_<br />

.399<br />

--- ---<br />

.161 1 .ooo<br />

Of primary interest arc Lhc bold-faced values shown on the diagonal. These V~~UCS reflect the prcdiclivc<br />

abiliry for each composilc’s target criterion; for example, the GPA composile rcvcals a cross-validity of .289 with<br />

the GPA criterion. These diagonal values may bc compared with the four corresponding validities observed for<br />

these composites in the developmental sample: .327 for GPA; .175 for APT; .297 for NSG; and .455 for TECH.<br />

As expected, development-sample validities wcrc sBghtly higher than the corresponding cross-validities. The GPA,<br />

NSG, AFT, and TECH composites each predicted its target criterion better than any of the other composilcs.<br />

Surprisingly, lhe APT composite was a better predictor of GPA and NSG than of APT. Overall, three of t!~<br />

four. composks (CPA, APT, and NSG) were bctlcr prcdiclors of GPA and N.SG than of APT and TECH.<br />

489<br />

- - -


Single-Criterion Predictor Composite<br />

GPA,<br />

GPA,<br />

Table 2<br />

Cross-Validity Coefficients for Single-Criterion and QI-89 Composites<br />

GPA<br />

-<br />

Criteria NSG TECH<br />

.289 .122 .237 .138,<br />

.304 -.oo I .228 ---<br />

APT, .219 .137 .230 .I08<br />

Aflb .220 .073 .193 ---<br />

NSG, .220 .I18 .291 .12l,<br />

NSG, .227 .009 .286 ---<br />

TECH, .102 .036 .137 .430,<br />

TECH, .I23 .056 .063 ---<br />

______________-__--_----------------- ________________________________________--------------------------------------------------------------<br />

QI-89. .282 .I26 .246 .131,<br />

QI-89, 296 .Oll .238 ---<br />

a as a subscript denotes rhe 1983-1986 hold-out sample @ = 2,305).<br />

b as a subscript denotes the 1987 hold-out sample (I’J = 652).<br />

c as a subscript dcnotcs Lhc 1983-1986 rcduccd hold-out sample @ = 1,313) wilh a valid TECH score.<br />

The cocfficicnts oblaincd on Ihe second cross-validation sample arc shown directly under the bold-faccd<br />

cross-validilics. Across all four single-criterion composites, the cross-validilics obtained on the 1987 sample varied<br />

litllc from l.hosc ob@ined on the 1983-1986 sample when GPA and NSG were predicted. All of the composiWs<br />

prcdictcd GPA slightly bcWr in the 1987 sample and NSG slightly bcUer in the 1983-1986 sample. Somewhat<br />

larger differe.nces wcrc found when predicting APT, The GPA, APT, and NSG composites predicted APT bcltcr<br />

in the 1983-1986 sample than in tie 1987 sample. The cross-validities for Ihe two samples were both near .OO<br />

when the’ TECH-derived composite was used to predict APT. Inspection of the correlations between APT and<br />

several highly-weighted predictors (i.e., HSR, SATM, and SATV) rcvealcd that diffcrcnccs in zero-order validities<br />

for the two samples appcarcd to account for the subscqucnt diffcrcnces in predictive abilily for lhcsc composites.<br />

QI-89. The bottom portion of Table 2 contains cross-validity coefficients for QI-89. In gcncral, the crossvalidity<br />

coefricicnts obtained with Ihc QI-89 showed little shrinkage from those obtained when each singlccriterion<br />

predictor composile was used to predict itself. The one exception occurred when TECH was the.<br />

criterion. This finding is logical because CNET assigned a relatively small importance rating to TECH when it<br />

was combined with the other three criteria. Although the QI-89 was only marginally useful for prcdicling APT<br />

and TECH, it retained a moderate level of predictive ability when used to predict GPA and NSG.<br />

Predicting Technical Maiors<br />

As shown in Figure 1, Lhosc midshipmen in the upper 20% on the ES scale were more than twice as likely<br />

to choose technical majors than those in Ihe lower 20%. To use the table, an individual’s ES score is locntccl<br />

in tic table, and Lhc likelihood of that individual sclccling a technical final major can bc determined. An adjunct.<br />

table for estimating ES was used (rather than incorporating ES into the optimally wcightcd selection composilc)<br />

so as lo avoid eliminating applicants with outstanding crcdcntials who might not rcccive NROTC scholarships if<br />

their intcrcsls tcndcd toward non-technical fields of study.<br />

Conclusions<br />

1. Although ES is dcrivcd from an instrument (i.c., Strong-Campbell Interest Inventory) that is susccptiblc<br />

10 distortion, results showed that ES can significantly incrcasc the proportion of technical majors.<br />

490<br />

- -


2. While the academically oriented criteria (Le., GPA and NSG) are predicted reasonably well, there is<br />

room for improvement for the military-performance criterion (APT).<br />

121 andatum<br />

114thu 120<br />

707 thfu 113<br />

98UtnrlO6<br />

97andMow<br />

Figure 1<br />

Expected Percentages of Midshipmen Selecting Technical Majors<br />

Next%<br />

Next 20%<br />

Mattson: i.D., Neumann, I., & Abrahams, N.M. (1986). Development of a revised comoosite for NROTC<br />

selection (NPRDC TN 87-7). San Diego: Navy Personnel Research and Development Center.<br />

Neumann, I., & Abrahams. N.M. (1978a). Construction and validation of a Strong Carn~bell Interest Inventory<br />

career tenure scale for use in selectinn NROTC midshipmen (NPRDC Letter Rep.). San Diego: Navy<br />

Personnel Research and Development Center.<br />

Neumann, I., & Abmhams, N.M. (1978b). Identification of NROTC apulicants with engineerinn and science<br />

interests (NPRDC Tech. Rep. 78-31). San Diego: Navy Personnel Research and Development Center.<br />

Owens-Kurtz, C.K., Borman, W.C., Gialluca, K.A., Abraham% N.M., & Mattson, J.D. (1989). Refinement of<br />

the ,Navq, Reserve Officer Training Corns (NROTC) scholarship selection composite (NPRDC Tech. Note<br />

TN 90-I). San Diego: Navy Personnel Research and Development Center.<br />

491


DEVELOPMENT AND IMPLEMENTATION OF A STRUCTURED<br />

INTERVIEW PROGRAM FOR NROTC SELECTION<br />

Walter C. Borman<br />

University of South Florida<br />

and Personnel Decisions Research Institutes, Inc.<br />

and<br />

Cynthia K. Owens-Kurt2<br />

and Teresa L. Russell<br />

Personnel Decisions Research Institutes, Inc.<br />

The Navy Reserve Officer Training Corps (NROTC) is one of the majc-:<br />

sources of Navy and Marine Corps officers. Presently, 40,000 young men 3.!: '<br />

women apply for a 4-year NROTC scholarship each year. Approximately 40% (:f<br />

this total pass an ,initial screen based on college board scores (minim:ca<br />

430 verbal and 520 math on SAT or equivalent ACT scores), proper age<br />

(between 17 and 21 when school,starfs, and no more than 25 at estimated<br />

time of college graduation), and, acceptable progress through high school.<br />

Those passing the screen (called Board Eligibles) are required to complete<br />

an application blank and to interview with a Naval officer, typically at<br />

one of the 43 recruiting.district headquarters. The focus of this paper ir<br />

on this officer interview.<br />

As conducted, previously, an officer interviewed each Board Eligible<br />

applicant, usually for 15-4-O minutes depending upon the personal style (-;f<br />

the interviewer and on the interview load (i.e., the number of NROTC Bo~I:'~;<br />

Eligibles that must be interviewed that day). Interviews were unstructuj-:<br />

in that interviewers were free to ask any questions they believed were<br />

relevant. After completion of the interview, the interviewer completed .?<br />

brie,f rating form.<br />

Experience with the previous NROTC interview showed that ratings wereoften<br />

at or near the top (most effective) end. For example, the mean<br />

rating on the Overall Potential scale for Board Eligibles in the most<br />

recent class for which data were available (class entering NROTC 1985) L:,:!:,<br />

4.68 on the 5-point scale. Further, when interview ratings (on the<br />

Potential scale) were correlated with the NROTC performance criteria, gra.i:<br />

point average (GPA), Naval science grades (NSG), and an aptitude rating<br />

(APT) , results were near zero (Owens-Kurtz, Borman, Gialluca, Abrahams, &<br />

Mattson, 1988). Finally, the effective weights for the interview when USC?<br />

along with SAT scores, high school rank, and SCII/BQ scores in regression<br />

analyses against these criteria were very low for GPA and NSG and only the<br />

third highest contributor to pyedic tion<br />

of APT (Owens-Kurt2 et al., 19ZSj.<br />

--------___-a--------m-w--------<br />

This research was supported by funds from the office of Naval Technolo9:;<br />

Program Element 0602233N. The opinions expressed are those of the authcrc;<br />

and do not necessarily reflect those of the U. S. Mavy.<br />

492


Accordingly, the NROTC selection interview program appeared to need<br />

improvement. The ratings on the intewiew form showed little<br />

differentiation between applicants, and the validity of the interview<br />

ratings was low.<br />

One plausible reason for problems with this interview is its<br />

unstructured format. Reviews of the employment interview (Arvey & Campig:<br />

1982; Schmitt, 1976) indicate that structured interviews generally provid!<br />

more valid prediction of performance than do unstructured interviews. A<br />

recent meta-analysis found a .35 mean uncorrected validity coefficient fo:<br />

structuied interviews, whereas unstructured interviews had a mean<br />

uncorrected validity of .11 for the studies included in the analysis<br />

(Cronshaw & Wiesner, 1989). It is possible that a structured interview fc<br />

NROTC selection might improve the interview's validity for identifying<br />

applicants likely to succeed in the NROTC program.<br />

This paper first describes development of the structured interview<br />

materials and then an evaluation of interview ratings made during pilot<br />

tests of these materials.<br />

METHOD<br />

Identifyinq Tarqet Predictor Constructs<br />

The first step in developing a structured interview protocol was t


i<br />

-_ll--ll-._l_--___----II----ll - ._.~_. -<br />

Accordingly, meetings with officer staff members in five NROTC units<br />

were conducted to generate ideas for these predictor constructs. A<br />

preliminary list of constructs emerqed from sessions with primarily COs and<br />

Class Advisors in these units. This list was briefed to the Chief of Navaj.<br />

Education and Training (CNET) staff and to Selection Board members and was<br />

then revised based on their feedback. The constructs are: NROTC Interest<br />

and Motivation; Leadership Potential; Responsibilities; Organization of<br />

Tasks and Activities; and Communication.<br />

Preparing Behavioral Statements for the Ratins Scales<br />

At this point, we prepared preliminary behavioral statements to<br />

reflect effective, average, and ineffective interviewee responses in each<br />

one of the five construct areas. The behavioral statements were based on<br />

what recruiters wit-n considerable experience in NROTC selection interviec;s<br />

had observed in actual interviews. We also received feedback from CNET and<br />

Selection Board members, and made final revisions. One of the resulting<br />

rating scales is shown below, with its behavioral standards.<br />

, cxprare, ~~~~-~tu?din~ duix to tx Naval . Ice1 4.ycrr Ccmrr~lTCJ Lg muonabl: ex- � r~arr~ohrvcnorcsl~tuu~inbcfn~r<br />

officn; would pobJbly rccept COll~:gC c~ngcfur~rcholur~ip;~urcrluul pluu h'nvy/hirvincCorprofiicrr;m~yprcfcr<br />

plogram ifrcjcud for r:holrrhhip ifrcjukd for rclmlurhip civilian r~l~olarrhip<br />

. rhowvi strong inlcrcrr in he Navy/hfarLu: ’ LC~ qu=tio~ aku md appcm maombly � ������� ��� ������ �� ���␛ �� ������<br />

Corps tluough impressive knowledge about incues~dinN~~alScrvi~~/~KO~~~~~~~~ Cups; mry LliUS wlrly 0" rclloidlip<br />

the Nsval Sc~jcJEIHOTC, d~ouglllful qua- money<br />

GO- rbo~~Ih~p~ogram.mdenlhusiaslic aIdmd,dcmurdh'ROTC<br />

_ ,_ . . . . . _. ._ . . . . . . . . . . . . . ._<br />

preparins Interview Ouestions<br />

After the interview rating scales were developed, we began preparinq<br />

questions designed to probe for reports of past behavior relevant to<br />

effectiveness in each area. Several questions for each rating category<br />

were developed and tried out with recruiters. The recruiters used<br />

different questions with different applicants, and noted those that seeme3<br />

to be most and least effective at eliciting responses useful for making<br />

ratings on each scale. The three to four questions that appeared most<br />

effective for each category were then presented to CNET, and final<br />

revisions to the questions were made.<br />

Prepariw Interview Instructions, a Training Videotape, and the Intervietd<br />

Worksheet<br />

---_<br />

In addition to development of the interview protocol rating scales an..;<br />

the interview questions, it was necessary to prepare instructions and an<br />

interviewer training videotape to ensure the structured interview is<br />

conducted properly. Thus, instructions and the videotape were prepared,<br />

along with an interview worksheet, with the interview questions present&<br />

and space provided for the interviewer to take organized notes of<br />

interviewee responses. The instructions and accompanying videotape prov:i':a<br />

brief training program on structured interviewing, explain proper use OL<br />

4 9 4


the behavioral statements for guiding interview ratings, and orient the<br />

interviewer to use the questions, the worksheet, and the interview rating<br />

form.<br />

&lot Testinq the Structured Interview<br />

The interview protocol and rating form were pilot tested in two wave:<br />

with a total of 31 officer interviewers and 93 applicants in seven<br />

different locations. Means and standard deviations of the ratings provid.<br />

data on their spread and overall distribution.<br />

As part of the pilot testing, an interrater reliability study was<br />

conducted. One way to assess the quality of data emerging from the ne:q<br />

structured ,interview is to determine how closely two interviewers agree i<br />

their independent ratings of the same interviewees. Thus, we initiated a<br />

interrater reliability study with 10 officers interviewing a total of 24<br />

applicants. All interviewers were trained to do the structured intervie,.+<br />

and to use the rating form. Each applicant was interviewed by two office<br />

recruiters.<br />

After each interview session, the interviewer completed the rating<br />

form and provided a copy to the researcher. Officers interviewing the s:<br />

applicant never discussed that applicant before making their ratings, so<br />

the interview judgments were generated totally independently. Intraclaorcorrelations<br />

were computed for each dimension separately and for the sum<br />

the dimension ratings. This provides an estimate of the across-intervie-consistency<br />

of ratings made using the new interview protocol and rating<br />

form.<br />

RESULTS<br />

Means and standard deviations for Wave 2 interview ratings, gathered<br />

after major revisions of the interview protocol (after Wave 1 pilot<br />

testing), appear in Table 1. For these 59 applicants, means are close TV<br />

4.0 (on a 5-point scale) and standard deviations are approximately 1.0.<br />

Further, these means compare favorably with data for the previous inter:;;<br />

(M=4.68 in 1985). Of course, this is not a very fair comparison because<br />

ratings on the new format were gathered for research, whereas the 4.68 r:t;<br />

is based on operational ratings. Nonetheless, applicant ratings using th<br />

new protocol appear to provide reasonable spread for the interview rati.n!:l.<br />

of these typically high quality NROTC applicants.<br />

Table 1 also contains the interrater reliability coefficients for<br />

ratings made of the 24 applicants evaluated by two independent<br />

interviewers. These are very high reliabilities, with considerable<br />

agreement shown on the part of the interviewers.<br />

495


.<br />

TABLE 1<br />

Means, Standard Deviations, and Interrater<br />

Reliability Coefficients for New Interview<br />

Protocol Ratings<br />

(N=59)<br />

Dimension -a.- S D Reliabilitya*<br />

NROTC Interest and Motivation 3.95<br />

Leadership Potential 3.78<br />

Responsibilities 4.00<br />

Organization of Tasks & Activities 4.20<br />

Communication 4.24<br />

Overall Evaluation 3.98<br />

Sum of First Five Dimensions 20.17<br />

1.07 . 81<br />

1.26 . 8 7<br />

. 95 . 81<br />

. 87 . 83<br />

. 99 . 93<br />

. 97 . 93<br />

4.35 . 95<br />

a. These are 2-rater intraclass correlations; N=24<br />

In addition, officer interviewers who used the new interview<br />

procedures were asked their opinions about this protocol compared to the<br />

previous one. Their comments are summarized in Table 2.<br />

TABLE 2<br />

Summary of Comments on the New<br />

Interview Protocol Rating Form<br />

a Big improvement over old form<br />

a Not too long or burdensome (interviews timed at 12-30 minutes<br />

including answering candidate questions, not filling out form; aver;rc:;.<br />

about 15-18 minutes)<br />

a Concept of behavioral standards well understood and accepted<br />

a Worksheet especially helpful when doing several interviews back-toback<br />

without completing the ratings<br />

a Videotape seen as very clear and useful<br />

a Interview program takes pressure off interviewer by providing good<br />

questions to ask<br />

a Interview program gives diverse interviewer types (e.g., NROTC staff,<br />

officer recruiters, Reservists, etc.) more common frame of reference<br />

DISCUSSION AND CONCLUSIONS<br />

Evaluations of the new interview materials and procedures by NROTC<br />

Selection Board members, NROTC officers, and officer recruiters responsiblk:,<br />

for interviewing NROTC applicants (as well as data from field tests of th?<br />

interview), suggest that these materials and procedures are ready for<br />

implementation. What is most urgently needed to evaluate the usefulness of<br />

496


- .__<br />

the new interview is criterion-related validity information. Future<br />

validation efforts will be important in evaluating the value of interview<br />

ratings by themselves and in combination with other measures (e.g., colleq<br />

board scores), in predicting important NROTC criteria such as GPA, NSG, 21:<br />

APT, and perhaps attrition from the scholarship program. The interrater<br />

reliability study on the new interview (Borman & Owens-Kurtz, 1989) and<br />

data from Table 1 suggest that the interview has qood potential for<br />

improving the prediction of NROTC student performance. However, validity<br />

data are needed to assess its usefulness in actual practice.<br />

REFERENCES<br />

.<br />

Borman, W. C., & Owens-Kurtz, C. K. (1989). Development and field test __ .-._ of<br />

a structured interview protocol for NROTC selection (Institute Repcr?<br />

178). Minneapolis, MN: Personnel Decisions Research Institutes, 13~.<br />

Cronshaw, S. F., & Wiesner, W. H. (1989). The validity of the employmenk<br />

interview: Models for research and practice. In G. R. Ferris, and I:<br />

W. Eder (Eds.), The employment interview: _ Theorv, research and<br />

practice. Beverly Hills, CA: Sage.<br />

Owens-Kurtz, C. K., Borman, W. C., Gialluca, K. A., Abrahams, M. M., h<br />

Mattson, J. D. (1988). Refinem_ent of the Navy Reserve Officer<br />

Traininy Corps (NROTC) scholarship selection composite (Institute<br />

Report 144). Minneapolis, MN: Personnel Decisions Research<br />

Institutes, Inc.<br />

Schmitt, N. (1976). Social and situational determinants of interview<br />

decisions: Implications for the employment interview. Personnel<br />

Psycholoqy, 22, 79-101.<br />

497<br />

I<br />

I<br />

j<br />

!


I__--..<br />

_ , _.. .~ -.._- --..<br />

Development of an Experimental Biodataemperament Inventory for NROTC Selection1<br />

Mary Ann Hanson and Cheryl Paullin<br />

Personnel Decisions Research Institutes, Inc.<br />

Walter C. Barman<br />

University of South Florida and Personnel Decisions Research Institutes, Inc.<br />

One component of the Naval Reserve Officer Training Corps (NROTC) Scholarship program selection<br />

process in need of revision or replacement is the Biographical Questionnaire (BQ). The BQ key<br />

(Neumann, Githens, & Abrahams, 1967), which was developed to predict officer retention beyond initial<br />

obligated service, is somewhat dated and does not correlate well with NROTC performance criteria,. In<br />

addition, the BQ itself was developed over forty years ago (Rimland, 1957). Much has been learned in<br />

the meantime about the development of biodata items, and many of the BQ items appear dated. Thus, the<br />

development of a new biodata instrument seemed in order. This paper will describe the development,<br />

preliminary evaluation, and refinement of an experimental biographical data and temperament inventory<br />

designed to predict NROTC performance and attrition.<br />

Method<br />

Developing the pilot Profile of Exoeriences and Characteristics (PEC)<br />

A rational, construct-based approach was taken, both to develop and to refine this new experimental<br />

inventory, The first step in developing the inventory was to more clearly specify the criterion constructs<br />

it is designed to predict. Performance measures currently used by the NROTC were identified (e.g.,<br />

Naval Science Grades), and the constructs that underlie these performance measures were specified. The<br />

underlying performance constructs identified were academic achievement, leadership, military bearing,<br />

and goal setting. Attrition from the NROTC program occurs for a variety of reasons, and the underlying<br />

causes of attrition include academic failure, inaptitude, and dislike for the military (see Owens-Km%<br />

Gialluca, & Bonnan, 1989). The present research focused on identifying predictors of the performance<br />

and attrition constructs for which prediction is presently poor. Because academic achievement and academic<br />

failure are predicted at least moderately well by existing predictors, less emphasis was placed on<br />

identifying predictors of these criteria.<br />

A literature review was conducted to identify individual differences constructs, especially biographical<br />

and temperament constructs. that have shown empirical links with criteria similar to the NROTC performance<br />

and attrition constructs in past research. Item-level validities for several other inventories were<br />

also reviewed. Eight individual differences constructs wete identified that have been found, in past research,<br />

to be valid predictors of criteria similar to the NROTC performance/attrition constructs. These<br />

eight constructs were labeled: (1) Achievement Motivation; (2) Team Orientation; (3) Dominance; (4)<br />

Sociability; (5) Leadership Orientation; (6) NROTC/<strong>Military</strong> Interest and Motivation; (7) Organization<br />

and Planning; and (8) Responsibility.<br />

Items were written to tap each of these eight constructs. Past research (e.g., Doll, 1971) has shown<br />

that responses to verifiable items (i.e., items for which the truthfulness of responses can be checked using<br />

an external source) are less often distorted, Because biodata items typically deal with observable behav-<br />

1 This research was supported by funds from the Office of Naval Technology, Program Element 062233N.<br />

The opinions expressed are those of the authors, and do not necessarily reflect those of the U.S. Navy.<br />

498


iors, these items are more likely to be verifiable. Thus, an effort was made to include as many biodata<br />

items as possible in the pilot version of the PEC. However, when sufficient numbers of biodata items<br />

could not be written to adequately cover a construct, temperament items were also included. Between 13<br />

and 21 items’were written to tap each of the eight predictor constructs. In order to detect response distortion<br />

by applicants if it occurs, a ten item response validity scale (called the Unlikely Virtues scale) was<br />

also developed and included in the inventory. Thus, the pilot version of the inventory, called the Profile<br />

of Experiences and Characteristics (PEC), contained 151 items.<br />

Evaluating the PEC<br />

Both rational and empirical approaches were taken in evaluating and refining the PEC. The rational<br />

approach was a retranslation exercise in which researchers independently categorized the PEC items into<br />

the eight biodata/temperament constructs. The empirical approach involved administering the PEC to a<br />

large sample of NROTC applicants. The inventory was also administered to a comparison sample of<br />

NROTC scholarship students.<br />

Retranslation Exercise<br />

The retranslation exercise had two purposes: (1) to determine whether researchers would agree concerning<br />

the placement of items on constructs; and (2) to obtain information that could be used to further<br />

revise and refine the composition of the constructs and their definitions. Seven researchers who were<br />

knowledgeable about biodata and/or personality research were asked to independently sort each of the<br />

PEC items into one of the construct categories according to the perceived match between item and category<br />

content. The degree of agreement among these researchers concerning the placement of items was<br />

then evaluated.<br />

Pilot Test<br />

The PEC was administered to all Board Eligible NROTC applicants who were processed for the<br />

1990 NROTC scholarship program between 18 December 1989 and 30 January 1990 as part of their application<br />

process. Completed PEC inventories were obtained for 972 NROTC applicants from nearly all<br />

of the 41 Navy Recruiting Districts. About 90 percent of the respondents in this pilot test sample were<br />

either 18 or 19 years old, and 91 percent were male.<br />

Frequency counts were conducted to identify and eliminate items where the vast majority of respondents<br />

marked the same response alternative. Next, a rational scoring scheme was developed so that a<br />

preliminary set of item- and scale-level scores could be computed. When cntenon data become available,<br />

this scoring system may need to be modified. The item-level scores that were computed were intercorrelated<br />

and factor analyzed.<br />

Comparison with “Honest” Sample<br />

In order to obtain some base rate information regarding how “honest” respondents (i.e., respondents<br />

who have little to gain by distorting their responses) score on the PEC, a comparison sample of students<br />

already enrolled in the NROTC scholarship program was administered the PEC under instructions to respond<br />

as honestly as possible, A total of 175 first-year NROTC scholarship students from the University<br />

of Minnesota, Notre Dame University, and Carnegie-Mellon University completed the PEC in January<br />

1990. This sample was 93 percent male.


-.<br />

_.---___----<br />

Data from this comparison sample were scored using the sanmprocedures that were used in the applicant<br />

sample. Mean item-level scores from the NROTC student sample were compared with those from<br />

the pilot-test sample in order to identify items with substantially different base rates. If an item’s mean<br />

score in the applicant group is slanted considerably more in the socially desirable direction than that of<br />

the student sample, it suggests that the item is relatively easily distorted by applicants.<br />

Refining the PEC<br />

Results from the pilot test data analyses, along with the information from the retranslation exercise,<br />

were used to revise the composition of the PEC constructs and their definitions. The inventory was then<br />

refined and shortened for future administrations. Descriptive statistics, internal consistency reliabilities,<br />

and scale score intercorrelations were computed for the final shortened scales.<br />

Evaluating the PEC<br />

Retranslation Results<br />

Results and Discussion<br />

Seventy-seven percent of the PEC items were sorted into the same predictor construct scale by five<br />

or more of the seven researchers who participated in the retranslation exercise. Seven of the remaining<br />

items were from the Unlikely Virtues (response validity) scale. It is not particularly surprising that some<br />

researchers sorted the Unlikely Virtues items into the construct categories. The Unlikely Virtues items<br />

were specifically written to resemble the eight original construct categories (so they would be subtle).<br />

The fact that some of the judges mistakenly sorted the Unlikely Virtues items into the construct categories<br />

suggests that the items are indeed subtle. In general, however, there was good agreement among the<br />

researchers concerning the placement of PEC items on constructs.<br />

Pilot-Test Results<br />

Frequency counts revealed that the vast majority of the PEC items had an adequate spread of responses<br />

across the response alternatives. Only a few items had response distributions that were considered<br />

unacceptable (e.g., over 90 percent of the respondents chose the most desirable response alternative).<br />

However, for some items the response distributions were much better than for others. This information<br />

was taken into account in refining the PEC, particularly in making decisions concerning which items to<br />

drop.<br />

The item-level intercorrelations were factor analyzed, and rotated principal factor solutions containing<br />

from 2 to 12 factors were examined. Based on a parallel analysis (Montanelli & Humphreys, 1976),<br />

the amount of variance accounted for by each factor, and the interpretability of the solutions, the eight<br />

factor solution was selected for further consideration,<br />

The amount of overlap between the results of the retranslation exercise and the factor analysis was<br />

encouraging. Items from the Leadership Orientation scale defined a factor, and nearly all of the items<br />

that were retranslated into this scale had their highest loading on that factor (8 of 11). Similarly, Organization<br />

and Planning and NROTC/<strong>Military</strong> Interest and Motivation also defined their own factors.<br />

Achievement Motivation defined a factor, but most of the Responsibility (7 or 12) items also loaded on<br />

this factor. Dominance and Team Orientation each defined a factor, and the Sociability items were split<br />

between these two factors, with the Sociability items involving friendliness loading on the Team Orientation<br />

factor and those involving talkativeness and assertiveness loading on the Dominance factor. Clearly the<br />

retranslation and the factor analysis results converged on very similar sets of constructs.<br />

500


Comuarison with “Honest” Sample<br />

The applicant sample generally chose more desirable response options (i.e., response options that led<br />

to higher scores) than the NROTC sample. For most of the PEC items, the difference between the mean<br />

item-level scores for the two groups was quite small. However, for a few items the difference was large,<br />

especially when the “correct” response was fairly obvious. These latter items are probably the most<br />

susceptible to distortion, and this information was considered in deciding which items to drop.<br />

Refining the PEC<br />

The results of the factor analysis and the retranslation exercise were both taken into account in defining<br />

the final set of PEC constructs. Where the retranslation and the factor analysis suggested a slightly<br />

different set of constructs, rational considerations guided formation of the fmal constructs. For example,<br />

although the Responsibility items were grouped with the Achievement LMotivation items in the factor<br />

analysis, the literature review suggested that these two predictor constructs would be rela:ed to somewhat<br />

different criterion constructs. Therefore, Achievement Motivation and Responsibility were kept separate,<br />

Revisions were made to many of the PEC constructs based on the retranslation and the pilot test analyses,<br />

resulting in a final set of seven “revised” biodata/temperament constructs. These revised constructs are<br />

listed on the left side of Table 1,<br />

Achievement Motivation<br />

Dependability<br />

Social Comfort<br />

Dominance<br />

Leadership Orientation<br />

NROTC/<strong>Military</strong> Interest and Motivation<br />

Organization and Planning<br />

Unlikely Virtues<br />

Miscellaneous<br />

Table 1<br />

Descriptive Statistics for the Final (Shortened) PEC Scales<br />

# of Items Mean<br />

.80<br />

1.29<br />

1.02<br />

.87<br />

51<br />

.68<br />

.58<br />

2<br />

.49<br />

.43<br />

.47<br />

A4<br />

.68<br />

.56<br />

.56<br />

23 iii<br />

Standard<br />

Deviation Reliability1<br />

NOIS. NC range from 962 to 964 for means and standard deviations; from 898 to 953 for the reliabilities. (Computation of coefficient alpha<br />

required complete data.)<br />

l Coefficient Alpha<br />

2 Descriptive statistics an not presented for the fiial Unlikely Vhues scale because it contains new items.<br />

3 The Miscellaneous category is not a scale, so descriptive statistics are not appropriate.<br />

501<br />

.82<br />

.59<br />

.73<br />

.82<br />

.78<br />

.73<br />

.80<br />

n/a<br />

n/a


After the new construct structure was delineated, each PEC item was assigned to a construct/scale<br />

according to its factor loadings and item content. Items that did not fit well into any construct were<br />

placed in a “Miscellaneous” category. Item-total correlations were then computed for these revised construct<br />

scales, and these were used to help guide decisions concerning which items to retain m the final<br />

(shortened) PEC.<br />

The final step in the present research was to shorten and refine the PEC. Decisions concerning<br />

which items to drop took into account the pilot test results, the comparison sample results, the retranslation<br />

results, and the item content. A few promising items were retained that did not fit welI into any of<br />

the predictor constructs and placed in a “Miscellaneous” category. A total of thirty-two content scale<br />

items were dropped. In addition, several of the Unlikely Virtues scale items were revised or replaced<br />

based on results from the comparison and pilot sample data analyses.<br />

The final (shortened) version of the PEC contains 116 items distributed across the final construct<br />

scales as shown in Table 1, Table 1 also presents descriptive statistics for these scales. All of the scales<br />

except Dependability have very good internal consistency reliability. The internal consistency of the<br />

Dependability scale is comparatively low, and the mean score on this scale is also quite high in the applicant<br />

sample. This scale was retained for further study in spite of these problems because, based on the<br />

literature review, it is expected to predict attrition. Descriptive statistics are not reported on Table 1 for<br />

the final Unlikely Virtues scale, because this scale contains new and revised items. Table 2 presents the<br />

intercorn3lations among these final construct scale scores.<br />

Achievement Motivation (AM)<br />

Table 2<br />

Scale Intercorrelations for the Final (Shortened) PEC Scales<br />

Dependability (DP) .56<br />

Social Comfort (SC) .32 .I8<br />

Dominance (DM) .42 .29 .47<br />

AM DP SC DM LO NR OP<br />

Leadership Orientation (LO) A4 .28 .42 .56<br />

NROTC/<strong>Military</strong> Interest and Motivation (NR) .42 .35 .25 .33 .31<br />

Organization and Planning (OP) -58 -42 .19 .27 .31 .34<br />

Unlikely Virtues (XIV) l .45 .30 .25 .33 .25 .35 .33<br />

NOW Ns range from 962 to 964.<br />

l Sum of 7 idned items.<br />

502


Conclusions<br />

The final experimental PEC measures seven biodata/temperament con~tn~ct~. Each of the construct<br />

scales seems to be reasonably homogeneous and focused on the intended personal characteristics, experiences,<br />

and motivation constructs. The inventory has good potential for enhancing the prediction of<br />

NROTC performance and attrition, Further research is needed to evaluate validity of the PEC for predicting<br />

perfonance and attrition.<br />

REFERENCES<br />

Doll, R. E. (1971). Item susceptibility to attempted fakiig as related to item characteristics and adopted<br />

fake set. Journal of Psychology, 77,9-16.<br />

Montanelli, R. G., Jr., & Humphreys, L. G. (1976). Latent roots of random data correlation matrices<br />

with squared multiple correlations on the diagonal: A Monte Carlo study. Psychometrika. 41,341-<br />

348.<br />

Neumann, I., Githens, W. H., & Abraham& N. M. (1967). Development and evaluation of an o&?&r<br />

potential composite (NPRDC TR 98- 18). San Diego: Navy Personnel Research and Development<br />

Center.<br />

Owens-Kuxtz, C. K,, Gialluca, K. A., & Borman, W. C. (1989). Exumination of the attrition coding systern<br />

and development of potential attrition predictors for the Navy Reserve Oflcer Training Corps<br />

(NROTC)program (Institute Report No. 179). Minneapolis: Personnel Dectstons Research Institute.<br />

Rlland, B. (1957). The development of a fake-resistant testfor selecting career-motivoted NROTC<br />

scholarship recipients (PRFASD Report No. 112). San Diego: U.S. Naval Personnel Research<br />

Field Activity.<br />

503


803<br />

PSYCHOLOGICAL APPLICATIONS TO ENSURING PERSONNEL SECURITY:<br />

A SYMPOSIUM<br />

BORMAN, W., University of South Florida and Personnel Decisions Research<br />

Institutes, Inc.;<br />

BOSSHARDT, M., DUBOIS, D., and HOUSTON, J., Personnel Decisions Research<br />

Institutes, Inc., Mpls., MN;<br />

CRAWFORD, K., Defense Personnel Security Research and Education Center,<br />

Monterey, CA;<br />

WISKOFF, M., and ZIMMERMAN, R., BDM <strong>International</strong>, Inc., Monterey, CA;<br />

SHERMAN, F., Marine Security Guard Battalion, Quantico, VA.<br />

The national security and financial consequences of unsuitable conduct<br />

and compromise of classified information by persons in sensitive or<br />

high security risk jobs are enormous. To protect against these types<br />

of unreliable behavior, a,.personnel security program is utilized by<br />

the Department of Defense. This program has two major emphases. The<br />

first involves screening individuals who are being considered for<br />

initial clearances. The second emphasis is the ongoing or continuing<br />

assessment of cleared personnel. With respect to initial screening,<br />

investigative interview procedures and background questionnaires to<br />

screen applicants are discussed. Regarding continuing assessment,<br />

current military service programs and an approach to assessing Marine<br />

Security Guard behavior are described. The symposium concludes with a<br />

discussion of some of the difficulties encountered by practitioners<br />

and new approaches to improve personnel security practices.<br />

504


THE INVESTIGATIVE INTERVIEW><br />

A REVIEW OF PRACTICE AND RESEARCiI<br />

David A. DuBois and Michael J. Bosshardt<br />

Personnel Decisions Research Institutes, Inc.<br />

Martin F. Wiskoff<br />

BDM <strong>International</strong>, Inc.<br />

Introduction<br />

An important element in safeguarding national security is maintaining personnel security.<br />

Each year thousands of individuals are assigned to jobs that provide them with access to<br />

extremely sensitive or classified information that could adversely affect national security. The<br />

Background Investigation (BI) is the primary method for screening personnel for positions<br />

requiring a Top Secret clearance. This method relies principally on obtaining information from<br />

an interview with the subject. Supplementary information is gathered through self-report<br />

background questionnaires, loca1 agency checks, national agency checks, credit checks, and<br />

interviews with character and employment references. This information is evaluated against<br />

various administrative criteria by adjudicators.<br />

Objectives<br />

The objectives of this paper are to (1) describe the investigative interview, (2) review<br />

research related to the investigative interview, and (3) identify directions for future research in<br />

this area.<br />

Research Approach<br />

A literature review was conducted to identify empirical and descriptive studies related to<br />

the investigative interview. Specifically, computerized and manual literature searches were<br />

performed, as well as a telephone survey of experts from academia, industry, professional<br />

associations, and integrity test publishing companies. In addition, detailed information on<br />

investigative interview practices within the federal government was obtained through site visit<br />

interviews with 10 senior officials at five federal agencies.<br />

What is the investigative interview?<br />

The investigative interview is a method used for gathering information to determine the<br />

reliability of individuals for working in positions of trust or positions that provide access to<br />

extremely sensitive or classified information. Most investigative interviews are conducted by<br />

,organizations within the military services, federal government, and defense industry. These<br />

interviews can involve either the subject or persons who know the subject (e.g., references, past<br />

employers). They are typically conducted by interviewers who are trained in interviewing<br />

505<br />

.


-- -------<br />

methods and nonverbal communication techniques (e.g., kinesics, proxemics). The interview,<br />

which may be conducted using a variety of formats (e.g., structured, semi-structured,<br />

unstructured), typically covers topics such as honesty, substance abuse, emotional stability, and<br />

financial irresponsibility. This interview information is then summarized in narrative or rating<br />

form, and combined with other information about the subject (e.g., from self-report background<br />

questionnaires, local and national agency checks, credit checks). Senior adjudicators then make<br />

final screening decisions.<br />

Current Investigative Interview Practice<br />

Ten senior officials from five government organizations [Defense Investigative Service<br />

(DIS), the Office of Personnel Management (OPM), the Federal Bureau of Investigation (FBI),<br />

the Central Intelligence Agency (CIA), and the Defense Intelligence Agency (DIA)] were<br />

interviewed to obtain detailed information regarding current investigative interviewing practice<br />

for individuals being considered for Top Secret personnel security clearances. Each interview<br />

lasted 1 to 3 hours. A composite description of the major features of both subject and nonsubject<br />

investigative interviews is presented below.<br />

Preparation<br />

Overall, the interview procedures followed by these agencies are remarkably similar in<br />

many respects. The interviewer generally prepares for the interview by reviewing available<br />

background information about the subject for missing, discrepant, and issue-oriented<br />

information. From this background information, specific interview questions are developed.<br />

Setting<br />

Subject and non-subject interviews are often conducted in different settings. Subject<br />

interviews are usually conducted in a government office setting, whereas non-subject interviews<br />

are less likely to be held in an office. In both types of interviews, privacy and freedom from<br />

distractions are the principaI requirements for the interview setting.<br />

Conduct<br />

Guidelines for interviewer conduct are similar across agencies. These guidelines include<br />

acting in a professional mann,er, dressing in a businesslike manner, and being courteous,<br />

respectful, and non-judgmental.<br />

Format<br />

The investigative interview is conducted in four phases: introduction, background form<br />

review, issue development, and conclusion. Each phase is different in content and tone. The<br />

entire subject interview typically takes from one half to one hour in length.<br />

Introduction. During the introduction, the interviewer usually (although not always)<br />

shows credentials and positively identifies the subject. In this phase, the interviewer develops<br />

506


7B---.-- --<br />

rapport with the subject, explains the interview purpose and format, and secures a verbal<br />

commitment from the subject to provide truthful and complete information.<br />

At some point during the subject interview, the interviewer informs the subject of the<br />

privacy act. This may be done at the beginning of the interview (e.g., DIS, OPM) or near the<br />

end of the interview (e.g., FBI, CIA). OPM subject interviews are conducted under oath. None<br />

of the other agency officials mentioned use of an oath, although DIS interviewers seek written<br />

signed statements when the subject provides significant derogatory information.<br />

Background Review. Following the introduction to the subject interview, the interviewer<br />

generally reviews the subject’s background history form. During this phase of the interview, the<br />

interviewer questions the subject about specific items on the form, emphasizing items that ‘have<br />

been identified as omitted or discrepant during the preparation phase. A review of each item on<br />

the form is generally not undertaken.<br />

Issue Development. In the issue development phase, the interviewer systematically<br />

questions the subject on a range of topics, In most agencies, a standard list of topics is covered.<br />

These topics, which are similar across agencies, include education, employment, residence,<br />

alcohol, drugs, mental treatment, moral behavior, family and associates, foreign connections,<br />

foreign travel, financial responsibi!ity, organizations, loyalty, criminal history, handling<br />

information, and trust. Coverage of interview topics generally begins with questions on the<br />

subject’s background (e.g., education, employment) and later proceeds into the more sensitive<br />

areas.<br />

Conclusion. The concluding phase of the interview is focused on answering any<br />

concerns of the interviewee. The next steps of the security clearance process are also explained<br />

at this time.<br />

Interview Procedures<br />

A variety of techniques are used to facilitate the investigative interview process.<br />

Interviewers are typically trained in four general categories of interviewing skills: motivation,<br />

questioning, observation, and listening.<br />

Motivation. Subjects are motivated in disclosing sensitive information to the interviewer<br />

in several ways. The interviewer ensures that the interviewee understands the purpose, format,<br />

and content of the interview. The “whole person” concept of adjudication is explained so that<br />

the interviewee understands that negative information is judged in terms of the circumstances of<br />

the situation, how long ago it happened, etc., and in terms of the positive qualities of the person.<br />

The interviewee is informed of the consequences of omitting or providing misleading<br />

information. The interviewer typically secures a verbal commitment to provide complete and<br />

truthful information.<br />

Rapport is maintained by displaying a non-judgmental attitude, fairness, and respect.<br />

Objections are managed by clearly identifying the nature of the objection or hesitation; re-stating<br />

507


it the interviewee, and addressing concerns directly.<br />

Ouestioninq. Several questioning approaches are used in conducting subject interviews.<br />

Although the topic areas are generally structured, only DIS emphasizes use of a structured set of<br />

questions for each topic. DIS interviewers typically ask four to seven short, direct questions<br />

regarding a subject area, followed by summarizing questions. Other agencies use more openended<br />

questions, followed with summary or verification questions. Interrogative questioning<br />

methods are not generally used by DIS, but are occasionally used by FBI interviewers.<br />

Observation. All of the agencies visited train their interviewers to look for possible<br />

verbal and nonverbal cues to deception on the part of the interviewee. Most of these indicators<br />

are based on noticing patterns of various verbal, paralinguistic, and nonverbal (body gestures,<br />

facial expression) indicators. When possible deception is detected, the interviewer may remind<br />

the subject of the importance of honesty, and that confidentiality is maintained.<br />

Listening. Interviewers are trained to listen to the whole response, to use active listening<br />

procedures, and to follow-up vague responses with questions that draw out details. Techniques<br />

such as re-statement and paraphrasing am used to encourage elaboration.<br />

Documentation<br />

Investigators normally take only limited (or no) notes during the interview. OPM<br />

interviewers tend to take the most extensive notes, while FBI interviewers generally take fewer<br />

notes. Upon completion of the interview, interviewers write or dictate a short report<br />

summarizing the results of the interview.<br />

Decision-Making<br />

In all agencies, interviewers obtain the interview information but adjudicators make the<br />

clearance decisions. OPM is unique in that it conducts interviews on a contract basis for over 90<br />

Federal agencies.<br />

Empirical Research<br />

No published empirical studies were found regarding the use of investigative or integrity<br />

interviews. The literature search did identify five unpublished studies, most of which were pilot<br />

studies.<br />

The most relevant of these compared the relative effectiveness of two types of<br />

background investigations--one with a subject interview and one without. Conducted by the<br />

Defense Investigative Service (Office of Personnel Investigations, 1986) the study involved a<br />

random sample of 47 1 military members, contractor employees, and DOD civilian personnel.<br />

For the 186 cases in which significant adverse information was identified, the background<br />

investigation which included the investigative interview developed significant information in<br />

164 of these cases. Furthermore, the procedure which included the investigative interview<br />

yielded 72 cases not identified by the traditional procedure. Based on these results, the research<br />

508


staff concluded that inclusion of the subject interview resulted in a significant improvement in<br />

the background investigation procedure.<br />

A survey by the Director of the Central Intelligence (Office of Personnel Investigations,<br />

1986) of 12 government agencies examined the productivity of various sources for the purposes<br />

of applicant screening and security clearances. Background investigation sources included in<br />

this study were subject interviews, neighbor interviews, education and employment record<br />

checks, national agency checks, and the polygraph. The results of the study suggested that the<br />

subject interview was the second most productive source for identifying serious adverse<br />

information.<br />

Flyer (1986) summarized much of the early personnel security screening literature<br />

conducted in the military. Although no data were presented, he noted that the most important<br />

finding of Air Force research on personnel security screening was “the unique and considerable<br />

value of the subject interview.<br />

In summary, the limited research on investigative interviews suggests that they may be<br />

useful personnel security screening devices.<br />

Related Research<br />

Although the research on investigative interviews is scarce, there is a wealth of research<br />

on interviewing in other contexts (e.g., employment, survey research). This research is useful to<br />

the extent that it suggests additional techniques to apply in the investigative interview setting or<br />

provides a theoretical model that explains interviewing behavior.<br />

For example, with respect to question characteristics, research examining eyewitness<br />

testimony (Lipton, 1977) compared the relative effectiveness of open-ended vs. close-ended<br />

questions. The results indicated that narrative, open-ended formats tend to produce very<br />

accurate, but incomplete information, Close-ended, interrogatory formats, on the other hand,<br />

tend to produce more complete, but less accurate information. This led one researcher (Loftus,<br />

1982) to suggest that open-ended questions be used first, followed by specific (close-ended)<br />

questions to ensure that complete information is obtained.<br />

The decades of research on employment interviews and the more recent research on the<br />

detection of deception provide a rich source of ideas for improving investigative interviewing<br />

procedures. Many of these ideas have been recently summarized in a review of investigative<br />

interviewing and related research (Bosshardt, DuBois, Carter, & Paullin, 1989).<br />

While these large scientific literatures on related interviewing techniques can provide<br />

many ideas, there is a strong need to thoroughly investigate the utility of these ideas in the<br />

investigative interview setting before adopting them in practice. A careful consideration of the<br />

very different contexts that exist between the investigative interview and other interview settings<br />

suggests that results may not generalize, or that the effects may not be the same.<br />

For example, the purpose of the investigative interview is to screen out people, while the<br />

509


purpose of the employment interview is to select in personnel. The rejection rate for<br />

investigative interviews is about 1% to 5%, while the selection ratio for employment interviews<br />

is typically about 20% to 60%. The focus of investigative interviews is on behavioral constructs<br />

such as behavior unreliability and unsuitability, while employment interviews focus on cognitive<br />

ability, motivation, and communication skills. Perhaps most importantly, the motivational<br />

approach used is very different for these two settings. The consequences of providing good<br />

information in an investigative interview is the avoidance of punishment and the reward of a job<br />

for the employment interviewee.<br />

Needed Research<br />

Although the interview has been extensively studied as a method of gathering<br />

information, little research is available regarding its use in the investigative interview setting. A<br />

variety of investigative interviewing procedures are currently in use and the large literature on<br />

other interview settings suggest additional procedures to consider. Research is needed to<br />

systematically evaluate the effectiveness of these various investigative interview methods.<br />

One major finding from employment interview research can probably be generalized to<br />

the investigative interview--the most impressive gains in interview validity result from the<br />

systematic study of the performance criteria that is to be predicted. Research that defines the<br />

psychological dimensions and behavioral detail of security relevant performance can contribute<br />

to significant improvements improving interviewer training, assessing personnel security risks,<br />

and predicting unreliable behavior.<br />

REFERENCES<br />

Bosshardt, M.J., DuBois, D.A., Carter, G.W., & Paullin, C. (1989). The investigative interview:<br />

A review of practice and related research (Technical report No. 160). Minneapolis, MN:<br />

Personnel Decisions Research Institute.<br />

Flyer, ES. (1986). Personnel securitv research: Prescreening and background investigations<br />

(Report No. HUMMRRO-FR-86-01). Alexandria, VA: HUMRRO <strong>International</strong>, Inc.<br />

Lipton, J.P. (1977). On the psychology of eyewitness testimony. Journal of Applied<br />

Psvchologv, a(l), 90-95.<br />

Loftus, E. (1982). Interrogating witnesses--good questions and bad. In R. M. Hogarth (Ed.),<br />

Guestion framing and response consistency. San Francisco: Jossey-Rass.<br />

Office of Personnel Investigations. (1986). Subiect interview study: Phase I report.<br />

Washington, D.C.: U.S. Office of Personnel Management.<br />

510


Backaround<br />

Each of the military services prescreens enlisted<br />

applicants for sensitive occupations, i.e., those that<br />

require a Top Secret clearance, access to Sensitive,<br />

Companmented Information, or are included in the<br />

Nuclear Weapons Personnel Reliability Program. The<br />

prescreening occurs prior to the initiation of the<br />

Personnel Security Investigation (PSI) and is designed<br />

to: (a) reduce the probability of assigning unreliable<br />

individuals to sensitive positions and (b) cull out<br />

individuals who are likely to be denied a security<br />

clearance. Crawford and Wiskoff (1988) in their review<br />

of the prescreening procedures used by the military<br />

services, found that they had been developed without<br />

empirical assessment of their validity and utility. As an<br />

example, they pointed out that despite intensive<br />

prescreening, the discharge rate from military service<br />

for reasons of unsuitability was not much lower for<br />

high-security occupations than that for other military<br />

jobs.<br />

The security interview at the <strong>Military</strong> Entrance<br />

Processing Station (MEPS) is the first step in the<br />

prescreening process for enlisted Army applicants to a<br />

sensitive job. Prior to the interview, applicants<br />

complete the Army Security Screening Questionnaire<br />

(DAPC-EPMD FORM 169-R). Responses to the<br />

questionnaire are examined by a security interviewer<br />

and explored further during an interview with the<br />

applicant. For those applicants who are accepted into<br />

a sensitive job and placed into the Delayed Entry<br />

Program (DEP), a second 169-R is completed and an<br />

interview conducted upon completion of the DEP.<br />

Puroose<br />

This purpose of this investigation was to explore<br />

the effectiveness of the 169-R as a security<br />

prescreening instrument, in terms of: (a) the degree to<br />

which it is able to predict two operational screening<br />

decisions and a measure of personnel reliability and<br />

(b) the utility or impact of using the infoimation it<br />

provides, along with other applicant data. The study<br />

was preliminary in that only a small sample was<br />

.analyzed to determine whether it would be fruitful to<br />

conduct a large scale study. A more complete<br />

discussion of the study and the results is available in<br />

Zimmerman, Fitz, Wiskoff, and Parker (in press).<br />

UTILIW OF A SCREENING QUESTIONNAIRE FOR<br />

SENSIT-IVE MILITARY OCCUPATIONS<br />

Ray A. Zimmerman<br />

Martin F. Wiskoff<br />

BDM international, Inc.<br />

511<br />

Sample<br />

Army Security Screening Questionnaires filled out<br />

by applicants from 1981 through 1986 were collected<br />

from MEPS throughout the country. Only the<br />

questionnaires completed during 1984 were used<br />

because: (a) the questionnaire had been revised<br />

several times during the years prior to 1984 and<br />

(b) individuals completing questionnaires after 1984<br />

would not have had the opportunity to finish their first<br />

term of service. Questionnaires were available for<br />

2,870 applicants. From these a random sample of 281<br />

non-prior service males was drawn. Analyses<br />

indicated that the sample appears to match the<br />

population of 1984 applicants to high security<br />

occupations fairly well in terms of Armed Forces<br />

Qualification Test (AFQT) scores and demographic<br />

variables such as race, age at service entry and level<br />

of education.<br />

Predictor Measures<br />

The Army 169-R administered in 1984 consists of<br />

a series of 45 questions which can be answered “yes”<br />

or “no,” relating to: (a) Prior <strong>Military</strong> and Federal<br />

Service, (b) Foreign Connections, (c) Drug Use,<br />

(d) Alcohol Use, (e) Emotional Stability, (9 Sexual<br />

Misconduct, (g) Financial Problems, (h) Employment<br />

Problems, (i) Delinquency, and (i) Legal Offenses. For<br />

each affirmative response, the applicant must provide<br />

details of the specific incidents or experiences. !n<br />

addition, applicants must supply detailed information<br />

about current financial obligations and any previous<br />

arrests, citations, or other types of contact with the<br />

legal system. Most applicants can complete the 169-R<br />

in approximately one-half hour.<br />

For this study, two classes of predictors were<br />

taken from the 169-R: (a) yes/no items and<br />

(b) detailed information that was transformed into<br />

coded items. There were 50 coded items analyzed as<br />

predictors.<br />

Other applicant data that are available at the time<br />

of the security interview were examined in conjunction<br />

with 169-R responses. These additional predictors<br />

included AFQT category, age at entry into the Army<br />

and level of education. The data were obtained frcm<br />

personnel records available at the Defense Manpower<br />

Data Center (DMDC).


Criterion Measures<br />

Crawford and Trent (1987) note that in personnel<br />

security research, the focus is on whether an individual<br />

demonstrates reliability, trustworthiness, good judgment<br />

and loyalty in the actual handling and us8 of Classified<br />

information. Failure of the individual could be<br />

manifested at one level in excessive security violations<br />

and at the extreme in the deliberate compromis8 of<br />

classified information, including espionage.<br />

Fortunately, compromise and espionage exhibit<br />

a very low base rate. Security violations, while more<br />

frequent, also show a low base rate, and in addition<br />

information on commission of violations is not available<br />

in centralized data baS8.S.<br />

Three alternative criteria were used in this study:<br />

1. Prescreening adjudication decision. This<br />

decision is made at the MEPS after the applicant has<br />

completed the 169-R and the security interview. The<br />

security interviewer, after consultation with security<br />

personnel within his/hsr chain of command, determines<br />

whether the applicant should be allowed to continue<br />

processing for a sensitive occupation. Many of the<br />

rejected applicants enter the Army in non-sensitive<br />

occupations and receive lower level security<br />

clearances. Historically, approximately 33 to 47% of<br />

applicants are rejected at this stage of processing.<br />

2. Issue Case status. If derogatory information<br />

is discovered during the course of the PSI, the<br />

investigation is expanded and designated as an “issue<br />

case.” This designation indicates, in most instances,<br />

that there is some evidence of a blemish in an<br />

individual’s behavior, associations, etc. that may be a<br />

cause to question his/her qualifications to hand18<br />

classified material. Issue case status has been<br />

employed as an operational criterion in previous<br />

studies (Crawford and Trent, 1987; Wiskoff and<br />

Dunipace, 1988). Data Concerning iSSue case Status<br />

W8r8 obtained from the Defense Central Index Of<br />

Investigations (DCII), a copy of which is maintained for<br />

research purposes at DMDC.<br />

3. Type of discharge. This variable refers to<br />

whether or not the individual was discharged from the<br />

Army for reasons of unsuitability. Unsuitability attrition<br />

is operationally defined as those acc8ssions listed on<br />

the DMDC Cohort File having inter-service separation<br />

codes 60-87 for failure to meet minimum behavioral or<br />

performance standards. Type of discharge has been<br />

used in many studies of military service attrition.<br />

Analvses<br />

Only the data from the S8COnd administration of<br />

the 169-R were used for individuals who had<br />

completed the form twice, i.e. entering and leaving<br />

DEP. This was necessary, because for these<br />

individuals, the final prescreening adjudication measure<br />

represents a decision that is based on information from<br />

the second set of responses.<br />

512<br />

The first set of analyses focused on the validity of<br />

the instrument. First, a series of correlational analyses<br />

was conducted to examine the relationship between<br />

each of the yes/no and coded items and the criierion<br />

measures. Next, empirical scoring keys for each of the<br />

criieria were developed using the horizontal percent<br />

method (Guion, 1965). The total score for each key<br />

was subsequently Correlated with each criterion<br />

measure. In addition, AFQT category and age at entry<br />

into th8 Army were examined for their inCrem8ntal<br />

validity in predicting issue case Status and type of<br />

discharge. Level of education could not be used<br />

because there were too few individuals who did not<br />

have a high school diploma.<br />

The second set of analyses examined the utility of<br />

decisions based on cutoff scores for the empirical<br />

scoring keys. Utility was assessed by examining the<br />

percentage of individuals that can be identified and<br />

Screened out using the empirical scoring keys and their<br />

associated cutoff scores, for different combinations of<br />

AFQT and age at entry categories.<br />

Results<br />

It is important, in examining the findings of this<br />

study, to note that the data for the three criterion<br />

measures do not represent the progression of a single<br />

cohort through th8 screening process. That is, each<br />

applicant’s predictor data were matched to his/her<br />

criterion data without regard for how the person fared<br />

on the other criteria. For instance, it is possible for an<br />

applicant to have been screened out of a sensitive job<br />

during th8 prescreening adjudication and still have<br />

criierion data on type of discharge, as long as the<br />

person did enlist in the Army in a non-sensitive<br />

occupation.<br />

In reviewing the relationships of the individual<br />

items to the criteria, it should be remembered that<br />

some types of negative behavior are r8latiV8ly rare or<br />

are not often admitted. This low baS8 rat8 for an item<br />

serves to restrict the variance of the variable and<br />

attenuate its correlation with the criterion. Overall, 11<br />

items showed statistically significant relationships with<br />

prescreening adjudication, thr88 with issue case status,<br />

and only one with type of discharge. Drug use and<br />

financial problems were the two content areas with the<br />

most significant relationships.<br />

The validity coefficients for the empirical scoring<br />

keys and the regression models (including the<br />

empirical keys and additional applicant data) are<br />

displayed in Table 1. Each key shows a significant<br />

correlation with th8 criterion it was designed to predict.<br />

Both the prescreening adjudication key and the issue<br />

case status key had fairly strong correlations with<br />

prescreening adjudication and issue case status. Only<br />

the type of discharge scoring key was significantly<br />

correlated with all three criteria, although the r’s only<br />

ranged from .12 to .15.


Empirical Scorinq Key<br />

Prescreening Adjudication<br />

Issue Case Status<br />

Type of Discharge<br />

Reqression Model<br />

ISSUe Case Status key, AFQT, and Age<br />

Type of Discharge key, AFOT, and Age<br />

� p c .05 � * p < .Ol<br />

Regression analyses were performed to examine<br />

the incremental validity of the additional applicant data.<br />

AFQT was collapsed into high (I-IIIA) and low (IIIB or<br />

below) categories, Age was collapsed into three<br />

categories: (a) 17 year olds, (b) 18-20 year olds, and<br />

(c) 21 year olds or older. In Table 1 it is seen that<br />

there is no evidence of incremental validity in predicting<br />

issue case status by including AFQT category or age<br />

at entry. However, for type of discharge, the validity<br />

coefficient increases from .15 to 22 with the addition of<br />

these variables.<br />

169-R Item<br />

Table 1<br />

Validity Coefficients for Empirical Scoring Keys<br />

and Regression Models<br />

Prescreening<br />

Adiudication<br />

a**<br />

.25**<br />

.12*<br />

Issue Case<br />

Status<br />

.21**<br />

.27**<br />

.15*<br />

.27**<br />

Type of<br />

Discharqe<br />

-.02<br />

-.02<br />

.15*<br />

.22**<br />

Figure 1 displays the 169-R items that are<br />

included within the scoring keys for each of the criteria.<br />

Four of the items, i.e. times marijuana use, times<br />

intoxicated, visits for nervous, emotional, mental<br />

counseling and suspended/expelled from School<br />

appear in all three scoring keys. Three other items are<br />

in two of the criteria while the remaining nine are only<br />

found in one of the keys.<br />

Prescreening Issue Case Type of<br />

. . .<br />

l&&cxl Status<br />

Times marijuana use J J J<br />

Frequency of marijuana use J J<br />

Used hard drugs J<br />

Possessed, transported, grown, produced, etc., drugs J J<br />

Transpotied, sold, etc., alcohol J<br />

Times intoxicated J J J<br />

Frequency of alcohol usage J<br />

Visits for nervous, emotional, mental counseling J / J<br />

Pregnant or caused pregnancy J<br />

Written bad checks J<br />

Made delinquent payments J<br />

Experienced financial problems J<br />

Left job under less than favorable conditions J J<br />

Suspended/expelled from school J J J<br />

Unsafe vehicle/licensing violations /<br />

Ran awav or considered runnina from home J<br />

Figure 1. Form 169-R items included in empirical scoring keys<br />

513


The final analysis looked at the utility of the<br />

scoring keys as defined as a reduced risk of:<br />

(a) having a security clearance denied to an individual<br />

who has been assigned to a sensitive duty position<br />

and (b) assigning unreliable individuals to sensitive<br />

duty positions. The utility was evaluated by first<br />

establishing cutoff scores and then determining what<br />

the impact would have been if the empirical keys and<br />

cutoff scores had been used in prescreening.<br />

The goal, in setting the cutoff scores, was to<br />

screen out individuals with low scores on the empirical<br />

keys and yet fulfill existing manpower requirements. In<br />

this sample, 19% of the non-prior service male<br />

applicants were rejected in the prescreening<br />

adjudication phase. Thus, cutoff scores were<br />

established for the three scoring keys at the point<br />

closest to the 19%/81% split. 2<br />

Empirical<br />

Scorinq Key<br />

Prescreening<br />

Adjudication<br />

score<br />

Below<br />

cutoff<br />

Issue Case Status Below<br />

cutoff<br />

Type of Discharge<br />

Regression model<br />

with Type of<br />

Discharge<br />

Below<br />

cutoff<br />

Above<br />

cutoff<br />

Below<br />

cutoff<br />

Above<br />

cutoff<br />

Table2<br />

impact of Using Cutoff Scores on the Issue Case<br />

and Unsuitability Dkharge Rates<br />

Table 2 shows the impact of using the three keys<br />

in terms of reducing the issue case and unsuitability<br />

discharge rates. The base rate for issue cases in this<br />

sample was 8.0% The percentages of issue cases<br />

above the cutoff was lower than the base rate for all<br />

three keys, with the issue case status key showing the<br />

lowest percentage (5.3%). Thus, the issue case rate<br />

could be reduced by approximately three percentage<br />

points by using this key. Analysis of DCII data<br />

revealed that 289 of the non-prior service males who<br />

entered high security occupations in the Army in 1984<br />

became classified as issue cases. Thus, approximately<br />

98 of these individuals would not have been allowed<br />

into high security occupations if the issue case status<br />

scoring key and its cutoff had been used for<br />

prescreening.<br />

Issue Case Status Tvoe of Discharqe<br />

Percent Percent Percent Percent<br />

w No Issue Unsuitable Normal<br />

25.9 74.1 11.5 88.5<br />

5.6 94.4 14.0 86.0<br />

22.2 77.8 9.4 90.6<br />

5.3 94.7 14.5 85.5<br />

16.1 83.9 24.4 75.6<br />

6.7 93.3 11.7 88.3<br />

28.0 72.0<br />

10.4 89.6<br />

Base Rate 8.0 92.0 13.5 86.5<br />

514


The base rate for applicants who received<br />

unsuitability discharges was 13.5%. Table 2 shows<br />

that the greatest reduction in this rate occurs with the<br />

use of the Type of Discharge key plus the<br />

supplementary predictors, i.e. AFQT category and age<br />

at service entry. At a cutoff score closest to the<br />

19%/81% split, the percentage of unsuitability<br />

discharges would have been reduced to 10.4%, slightly<br />

more than three percentage points below the base rate.<br />

This translates into 99 unreliable individuals who would<br />

have been screened out.<br />

Conclusions and Recommendations<br />

The major caveat in deriving operational<br />

conclusions from the findings of this study was the<br />

relatively small sample size. Other problems which are<br />

discussed in Zimmerman, et. al. (in press) are:<br />

(a) criterion issues such as the relevance of the criteria<br />

to personnel security decisions and the existence of<br />

false negatives (e.g., individuals classified as issue<br />

cases who are granted their securg clearances) and<br />

false positives (e.g., individuals who are never<br />

classified as an issue case yet they are turned down<br />

for a security clearance) and (b) impact of low base<br />

rates in both the predictors and criteria.<br />

Despite these caveats, further research on the<br />

169-R, using a large data sample, seems to be<br />

warranted for two reasons. First, the findings of this<br />

report clearly indicate the utility or benefit of using<br />

empirical scoring keys to supplement existing<br />

prescreening procedures based on the 169-R.<br />

Second, for many predictor variables from the 169-R,<br />

cell sizes were too small to compute a valid measure<br />

of association. If all available data for an entire year<br />

were analyzed, more definitive results could be<br />

obtained.<br />

In addition to analyzing a larger data sample, a<br />

potentially fruitful avenue is the revision of the 169-R to<br />

increase its validity.<br />

515<br />

Note: Since the completion of this study, research has<br />

been initiated on a much larger sample of 169-R forms<br />

completed by applicants in 1986. In addition, a<br />

revision of the 169-R has been developed jointly by the<br />

Defense Personnel Security Research and Education<br />

Center and the U. S. Total Army Personnel Command,<br />

and was operationally implemented on 1 October 1990.<br />

References<br />

Crawford, K. S. & Trent, T. (1987). Personnel security<br />

prescreenino: An application of the Armed<br />

Services Applicant Profile fASAP) (PERS-TR-87-<br />

003). Monterey, CA: Defense Personnel Security<br />

Research and Education Center.<br />

Crawford, K S., & Wiskoff, M. F. (1988). Screening<br />

enlisted accessions for sensitive military iobs<br />

(PERS-TR-89-001). Monterey, CA: Defense<br />

Personnel Security Research and Education Center.<br />

Guion, R. M. (1965) Personnel testing. New York:<br />

McGraw-Hill.<br />

Wiskoff, M. F. & Dunipace, N. E. (1988). Moral waivers<br />

and suitabilitv for hiah securitv militarv lobs (PERS-<br />

TR-88-011). Monterey, CA: Defense Personnel<br />

Security Research and Education Center.<br />

Zimmerman, R. A., Fitz, C. C., Wiskoff, M. F., & Parker,<br />

J. P. (in press). Preliminarv analvsis of the U. S.<br />

Armv Security Screening Questionnaire (PERS-TN-<br />

90-008). Monterey, CA: Defense Personnel<br />

Security Research and Education Center.


Continuing Assessment of Cleared Personnel in the <strong>Military</strong> Services<br />

Michael 3. Bosshardt<br />

David A. DuBois<br />

Personnel Decisions Research Institutes, Inc.<br />

Kent S. Crawford<br />

The Defense Personnel Security Research and Education Center<br />

Problem and Backqround<br />

Examination of recent espionage cases suggests that few spies enter<br />

government service with the intent to commit espionage. Instead, most<br />

individuals become spies as a result of personal and situational factors<br />

that occur after they receive a personnel security clearance. This<br />

suggests that an ongoing program of continuing assessment (CA) for cleared<br />

personnel should be an important component of the personnel security<br />

process.<br />

Two other factors underscore the importance of the CA program. First,<br />

initial clearance screening procedures tend to be costly, involve<br />

conditions of very low base rates, and have unknown validity. Second,<br />

hostile intelligence activities probably focus more effort on currently<br />

cleared personnel than on uncleared individuals.<br />

Despite its importance and the fact that formal CA programs have been in<br />

existence for a number of years, little is known about operational CA<br />

programs (DOD Security Review Commission, 1985). In order to address this<br />

deficiency, a project was initiated to evaluate how well CA programs are<br />

operating in the military services. The principal activities in this<br />

project included a review of regulations and literature related to CA<br />

(DuBois & Bosshardt, 1990), a survey of personnel at 60 Army, Air Force,<br />

Navy, and Marines Corps installations world-wide to obtain detailed<br />

information about CA programs (Bosshardt, DuBois, Crawford, & McGuire,<br />

1990; Bosshardt, DuBois, & Crawford, 1990a), and an analysis of systems<br />

issues related to CA (Bosshardt, DuBois, & Crawford, 1990b).<br />

Objectives<br />

The objectives of this paper are to (1) present some of the key findings of<br />

this survey of CA programs and (2) provide a preliminary assessment of the<br />

effectiveness of these programs.<br />

Approach<br />

The initial step in the study involved a review of regulations and<br />

literature related to CA. We then conducted a series of meetings with<br />

service branch headquarters and adjudication officials to gain a further<br />

understanding of CA policies and programs. Following this, nine military<br />

516


installations were visited to obtain an understanding of operational CA<br />

programs in the military and to gather information necessary for developing<br />

the survey research approach.<br />

These research activities led to the development of three preliminary<br />

survey forms. The principal form was a structured interview protocol for<br />

installation security office representatives. Two shorter survey forms<br />

were also developed for unit security managers and unit commanders.<br />

Preliminary versions of these forms were reviewed by several CA experts and<br />

pilot tested prior to actual survey administration.<br />

The survey forms were administered between September, 1989 and January,<br />

1990. The sample included 60 sites (21 Air Force, 19 Army, 18 Navy, and 2<br />

Marine Corps). Forty-eight were sites where individuals primarily had<br />

collateral access (i.e., top secret, secret, or confidential access) and 12<br />

were sites where individuals primarily had SC1 access; ten were overseas<br />

sites. Overall, completed survey forms were received from 60 installation<br />

security managers, 126 unit security managers, and 88 unit commanders.<br />

Results and Discussion<br />

The structured interview protocol for installation security managers<br />

included approximately 60 open-ended questions and numerous rating items.<br />

Two key issues concern the best sources of CA-relevant information and the<br />

most frequently reported types of CA information. Data concerning both<br />

issues are presented below.<br />

Sources of CA Information. Installation security managers were asked to<br />

rate the willingness of various groups to share derogatory information of<br />

security relevance with the security office. The results indicated that<br />

the military police, the clearance adjudication facility, and the<br />

investigations office are among the most willing to share information with<br />

the security office. Several types of installation personnel (e.g.,<br />

installation commanders, unit commanders, unit security managers) received<br />

moderate to high ratings. Most installation departments (e.g., medical,<br />

personnel, legal) and non-installation groups (e.g., local civilian police,<br />

federal agencies) were perceived as only moderately willing to share<br />

derogatory CA information. Employee assistance groups received relatively<br />

low ratings. Not surprisingly, coworkers and subjects were rated as least<br />

willing to share derogatory information.<br />

Tvoes of CA Information Reported. Installation security managers estimated<br />

the number of valid derogatory incidents reported to their security office<br />

during the past year for each of 12 types of information. The mean number<br />

of reported incidents (per 1000 cleared individuals) for various areas is<br />

shown in Table 1.<br />

' A complete summary of all results is presented in Bosshardt, OuBois,<br />

Crawford, and McGuire (1990).<br />

517


Table 1<br />

Mean Estimated Number of Valid Derogatory Incidents Reported to Collateral<br />

and SC1 Installation Security Offices During the Past Twelve Months<br />

(Per 1000 Cleared Individuals)<br />

Tvoe of Reoorted Incident<br />

Alcohol abuse 12.1<br />

Other incidents (e.g., non-judicial punishments) 9.5<br />

Drug abuse 6.6<br />

Criminal felony acts not covered in other categories 3.4<br />

Financial problems 3.1<br />

Court martials/desertions 3.1<br />

Falsification of information acts<br />

EmotionaJ/mental/family problems K<br />

Security violation incidents 2:1<br />

Sexual misconduct 1.6<br />

Foreign associations/travel incidents<br />

Disloyalty to the U.S. ::<br />

Collateral SC1<br />

Sites Sites<br />

Note. The samples include 43 collateral sites and 12 SC1 sites.<br />

The results in Table 1 suggest that alcohol abuse and other incidents<br />

(e.g., NJPs) are the most frequently reported areas at both collateral and<br />

SC1 sites. Overall, the average number of reported incidents across all<br />

incident categories (per 1000 cleared individuals) is 46.9 for collateral<br />

sites and 42.3 for SC1 sites.<br />

The CA survey yielded a considerable amount of quantitative and qualitative<br />

data. In addition to the interview data provided by installation security<br />

managers, four types of data were gathered: (1) ratings by installation<br />

security managers, unit security managers, and unit commanders of 136<br />

obstacles in maintaining an effective CA program, (2) write-in responses<br />

(n=684) by these three groups regarding the major CA problems, (3) ratings<br />

by installation security managers of 143 suggestions for improving CA, and<br />

(4) write-in suggestions (n = 636) by installation security managers, unit<br />

security managers, and unit commanders for improving CA.<br />

In order to have a common basis for comparing the quantitative and<br />

qualitative data and to facilitate the interpretation of the survey<br />

results, a taxonomy of CA problem/recommendation (or "finding") areas was<br />

developed. This taxonomy included eight general categories: (1) security<br />

education for cleared personnel; (2) training for security personnel; (3)<br />

derogatory information indicators, sources, and methods; (4) clearance<br />

adjudication procedures; (5) accountability for CA; (6) CA regulations; (7)<br />

CA emphasis; and (8) CA system considerations (e.g., legal issues, number<br />

of cleared personnel).<br />

518<br />

9.3<br />

9.8<br />

3.2<br />

:::<br />

:1<br />

2.8<br />

6.8<br />

2:<br />

.2


Obstacles in Maintainina an Effective CA Proaram. In general, analyses of<br />

the quantitative and qualitative survey data indicated that security<br />

education for cleared personnel, training of security personnel, and<br />

derogatory indicators, sources, and methods are the biggest obstacles in<br />

maintaining an effective CA program across the eight taxonomy areas. The<br />

clearance adjudication process, the emphasis on CA, and CA system<br />

considerations received moderately high rankings across the "CA obstacles"<br />

data sets. CA regulations and accountability for CA received the lowest<br />

overall rankings.<br />

Ratings of 136 specific obstacles to maintaining an effective CA program<br />

were provided by all survey respondents. The six most highly rated items<br />

by collateral site respondents (N=224) are listed below:<br />

- Reluctance of individuals to self-report derogatory information.<br />

- Too much time is taken by central adjudication facility to make<br />

clearance suspensioh/revocation decisions.<br />

- Lack of/inadequacy of training modules to instruct commanders and<br />

supervisors on how to on how to spot, interpret, and manage the<br />

early warning indicators of personnel security risks.<br />

- Reluctance of coworkers to report derogatory information.<br />

- Lack of standard training modules for unit commanders, supervisors,<br />

and cleared individuals which describe their continuing assessment<br />

responsibilities.<br />

- Delays in obtaining replacement personnel for individuals who lose<br />

security clearances.<br />

Recommendations for Imorovinq CA. Installation security managers rated 143<br />

suggestions for improving CA using a IO-point rating scale. Those items<br />

receiving the highest mean ratings are listed below:<br />

- Develop training modules to instruct commanders and supervisors on<br />

how to spot and manage the early warning indicators of personnel<br />

security risks and personnel problems.<br />

- Modify the regulations to direct other installation groups to<br />

provide more information to the security office.<br />

- Create a separate, full-time position for personnel security<br />

officers.<br />

- Improve continuing assessment training for supervisors.<br />

- Develop formal reporting procedures and written standards for the<br />

personnel, medical, legal, and other departments which define the<br />

types of information to be shared with the security office.<br />

- Increase/improve continuing assessment training for security<br />

managers.<br />

519


Effectiveness of CA. There is limited data for assessing the effectiveness<br />

of current CA programs. Findings from the survey indicated that (I)<br />

approximately 80 percent of the installations surveyed maintain some<br />

statistics relevant to CA (e.g., numbers and types of clearances, numbers<br />

of clearance suspensions and revocations, numbers of security violations,<br />

or numbers of reported derogatory incidents), (2) relatively few derogatory<br />

incidents are reported to the security office (see Table l), and (3) the<br />

number of clearance suspensions and revocations is very small. Table 2<br />

shows the number of clearance suspensions and revocations during the past<br />

12 months for sites in the survey sample.<br />

Table 2<br />

Approximate Types and Numbers of Clearances, Numbers of Clearance<br />

Suspensions, Numbers of Clearance Revocations for Survey Sites<br />

Confidential<br />

Clearances<br />

Secret<br />

Clearances<br />

Top Secret<br />

Clearances<br />

Top Secret<br />

Clearance with<br />

SC1 Access<br />

Mean Estimated<br />

Total Number<br />

(per site)<br />

295<br />

2847<br />

583<br />

678<br />

Mean Estimated Mean Estimated<br />

Number of Clearances Number of Clearances<br />

Suspended Per 1000 Revoked Per 1000<br />

During Past 12 Months During Past 12 Months<br />

(per site) (per site)<br />

0.8 0.3<br />

4.2 0.5<br />

1.2<br />

0.1<br />

2.4 1.8<br />

Notes. Estimates are based on information provided by installation<br />

security officers.<br />

The sample sizes for these analyses ranged from 48 to 54.<br />

Ratings of overall program effectiveness by installation security managers<br />

indicated that CA programs are moderately effective. The mean<br />

effectiveness ratings were quite similar across service branches, with the<br />

Air Force receiving the highest effectiveness among collateral sites and<br />

the Navy receiving the highest effectiveness rating among SC1 sites. The<br />

mean effectiveness ratings of SC1 and collateral programs were nearly<br />

identical within the Army and within the Air Force, but within the Navy SC1<br />

sites received much higher ratings than collateral sites.<br />

520


Installation security managers also rated the effectiveness of several<br />

aspects of the CA program. The results indicated that the clearance<br />

suspension/revocation process, sources of derogatory information, service<br />

branch regulations, indicators of security risk, reporting procedures, and<br />

security education are considered most effective. In contrast, the two<br />

lowest rated program aspects were performance appraisal information and<br />

incentives for reporting. The mean ratings were generally similar across<br />

service branches and for collateral and SC1 sites.<br />

In summary, little is known about the effectiveness of existing CA programs<br />

in the military services. The limited data suggests that these programs<br />

moderately effective, although they could be improved.<br />

Future Research<br />

Overall, the project resulted in 52 recommendations for improving CA<br />

programs (see Bosshardt, DuBois, Crawford, & McGuire, 1990; Bosshardt,<br />

DuBois, & Crawford, 1990b). The next step in this research program is to<br />

have personnel security experts from DOD, service branch headquarters,<br />

field installations, and the adjudication facilities prioritize these<br />

recommendations. Future research will focus on the highest priority items.<br />

References<br />

Bosshardt, M.J., DuBois, D., and Crawford, K. (1990a). Survev of<br />

continuinq assessment oroqrams in the militarv services:<br />

Recommendations. Monterey, CA: Defense Personnel Security Research and<br />

Education Center.<br />

Bosshardt, M.J., DuBois, D., and Crawford, K. (1990b). Survev of<br />

continuins assessment proqrams in the militarv services: Svstems issues,<br />

recommendations and program effectiveness. Monterey, CA: Defense<br />

Personnel Security Research and Education Center.<br />

Bosshardt, M.J., DuBois, D., Crawford, K., and McGuire, D. (1990). Survey<br />

of continuinq assessment oroqrams in the militarv services: Methodoloqv,<br />

analvses. and results. Monterey, CA: Defense Personnel Security<br />

Research and Education Center.<br />

DOD Security Review Commission, General Richard Stilwell (Chairman),<br />

(1985). Keepina the nation's secrets: A reoort to the Secretarv of<br />

Defense bv the Commission to Review DOD Securitv Policies and Practices,<br />

Washington, D.C.: Office of the Secretary of Defense.<br />

DuBois, D., Bosshardt, M. J., and Crawford, K. (1990). Continuinq<br />

assessment of cleared oersonnel in the militarv services: A conceptual<br />

analvsis and literature review. Monterey, CA: Defense Personnel<br />

Security Research and Education Center.<br />

521


Problem and Background<br />

A MEASURE OF BEHAVIORAL RELIABILITY<br />

FOR MARINE SECURITY GUARDS<br />

Janis S. Houston<br />

Personnel Decisions Research Institutes<br />

Martin F. Wiskoff<br />

BDM <strong>International</strong>, Inc.<br />

and<br />

Forrest Sherman<br />

Marine Security Guard Battalion<br />

The United States Marine Corps provides security guard services to meet the Department<br />

of State requirements at Foreign Service posts throughout the world. This use of Marines<br />

as security guards at Embassies, Legations, and Consulates was initiated in 1948 by a<br />

formal Memorandum of Understanding between the Department of State and the Secretary<br />

of the Navy. The primary mission of the Marine Security Guards is to protect the<br />

personnel, property, and classified and administratively controlled material and equipment<br />

within these premises.<br />

There are approximately 1300 Marine Security Guards (MSGs) currently serving at 140<br />

foreign posts in over 100 countries. These detachments range in size from five to thirtyeight<br />

Marines, and each is commanded by a senior non-commissioned officer, referred to<br />

as the “Detachment Commander”.<br />

The work described here was the fourth phase of a research effort undertaken jointly by<br />

the Marine Security Guard Battalion and the Defense Personnel Security Research and<br />

Education Center. Prior phases of this effort focused on improving the procedures used<br />

for pre-screening and selecting Marines for MSG duty, and are described in Parker,<br />

Wiskoff, McDaniel, Zimmerman, and Sherman (1989) and in Wiskoff, Parker, Zimmerman,<br />

and Sherman (1989).<br />

Obiective<br />

The primary objective of this work was to develop a system for the continuing evaluation<br />

(CVAL) of MSG performance and behavioral reliability. As has been pointed out<br />

(DuBois, Bosshardt, and Crawford, 1990), recent espionage cases suggest that individuals<br />

become spies as a result of personal and situational factors that occur after they receive<br />

personnel security clearances and are performing in sensitive or high security risk<br />

jobs. The importance of having a continuing assessment program for MSGs, in addition<br />

to the very careful selection procedures, was highlighted in December 1986, when Sgt.<br />

Lonetree admitted to providing information to the Soviet Union while serving in<br />

Moscow as an MSG.<br />

522


The goal for the CVAL system was to reduce the risk of personnel security incidents and<br />

improve the ability of Detachment Commanders to anticipate personnel problems before<br />

they became major disruptions. Thus, there was some emphasis on being able to use<br />

CVAL as a kind of warning system, one which would indicate when there was a need to<br />

intervene, either with informal counseling or disciplinary action short of judicial punishment.<br />

In this context, then, there were several ancillary objectives for the development of<br />

CVAL: (1) to provide an early warning indicator, with suggestions for intervention; (2)<br />

to provide a leadership, counseling, and training tool for Detachment Commanders; and<br />

(3) to minimize personnel turbulence and facilitate/document personnel decisions made<br />

concerning the reliability of MSGs.<br />

Method of Development<br />

General Orientation. It was felt&om the outset that some kind of behavioral checklist<br />

would be an appropriate format for the cornerstone of CVAL. In a recent review of<br />

personnel reliability programs (Bosshardt, DuBois, and Crawford, 1990), the need was<br />

pointed out for more careful definition of the factors that may indicate an individual has<br />

become a security risk. In the current project, we wanted to produce a checklist of<br />

observable behaviors that could indicate when an MSG’s performance was beginning to<br />

exhibit signs of unreliability. This checklist could then be completed by the Detachment<br />

Commander on a regular basis for each MSG, and appropriate action taken.<br />

Sources of Information. The primary source of information for the development of a<br />

CVAL checklist was the huge collection of written examples of MSG<br />

performance/behavior generated in a prior phase of this research effort. These performance<br />

examples were used in the prior research to develop behaviorally-anchored rating<br />

scales that could serve as criteria for validity investigations of the screening procedures<br />

(Houston, 1989).<br />

To obtain the performance examples, workshops were conducted with MSGs, Detachment<br />

Commanders, and the Instructors/Advisors at MSG School, all of whom had prior<br />

experience as MSGs and/or Detachment Commanders. Participants in the workshops<br />

were asked to write (in a structured format) examples of MSG behaviors that were indicative<br />

of extremely effective, average, and extremely ineffective performance. This technique<br />

yielded over 300 examples of behavior that realistically portrayed both highly<br />

effective and highly ineffective MSG performance. The examples were then sorted into<br />

categories that represented important dimensions of the MSG job. The set of dimensions,<br />

and the list of behaviors in each dimension, was the starting point in the development<br />

of a CVAL measure.<br />

Other sources of information included: evaluation forms that had been developed for use<br />

at MSG School, e.g., Peer Evaluation Forms and Screening Board Evaluation Forms;<br />

checklists developed for use as indicators of chemical dependency and emotional instability;<br />

and reports of existing personnel reliability programs, e.g., the Air Force’s Nuclear<br />

Weapons Personnel Reliability Program (PRP), the Department of Energy’s Human<br />

Reliability Program (HRP), and the Navy’s Security Access Eligibility Report (SAER).<br />

Another helpful source of information was the record of MSG Non Judicial Punishments<br />

and Reliefs For Cause kept at MSG Battalion Headquarters. A content analysis of these<br />

records was performed, to determine what types of behavior problems seemed to be the<br />

523


most common. Finally, MSG Battalion personnel were extensively interviewed, to solicit<br />

their ideas on what behaviors represented potential reliability problems.<br />

Preparation of Behavior Indicators Checklist. All of the information described above<br />

was converted to a list of discrete behaviors that could indicate a potential personnel<br />

security risk. These behaviors were sorted, where possible, into the categories used for<br />

the MSG performance rating scales developed in the prior phase of this research. New<br />

categories were formed where the pre-existing system did not seem to cover clusters of<br />

behaviors, and a number of the pm-existing categories were combined and/or renamed,<br />

as appropriate.<br />

The first draft of the Behavior Indicators Checklist contained 61 behaviors, grouped into<br />

10 clusters or behavior categories. Each of these behaviors was considered to be an<br />

indication that an MSG might be headed for, if not already in, some kind of trouble,<br />

ranging from emotional instability to drinking problems, or simply not realizing the<br />

dangers of becoming too friendly with Foreign Service Nationals about whom little was<br />

known.<br />

Examples of checklist behaviors are: “MSG often becomes disorderly or violent when<br />

drinking” and “A Foreign Service National shows a sudden increase of favors towards<br />

this MSG.” There were a number of behaviors that, while not particularly desirable, may<br />

not indicate a real problem if the behavior is relatively short in duration, for example,<br />

“MSG frequently asks to get off duty early or switch duty assignments.” There might be<br />

an acceptable reason for the latter example, e.g., visiting relatives or a special, detachment-related<br />

project. The important point here is that the Detachment Commander<br />

should be aware of the reason for these behaviors, and, if appropriate, take action to<br />

decrease undesirable or dangerous behaviors.<br />

Field Review: An Iterative Process. There were two rounds of field review of the<br />

Behavior Indicators Checklist. In both cases, the checklist was taken out to MSG detachments<br />

and feedback was obtained in small group (or one-on-one) structured interviews<br />

with incumbent MSGs, Detachment Commanders, and a number of the Department<br />

of State officials who work with MSGs in the detachments. Sites were selected<br />

with the following criteria in mind: (1) detachments with Commanders who had a fair<br />

amount of experience in the MSG program; (2) as much geographical dispersion as<br />

possible, within the constraints of our budget; (3) sites that varied in terms of their perceived<br />

desirability (a function of potential threat and of general desirability and hospitality<br />

of the location); (4) detachments that varied in terms of their size, i.e., number of<br />

MSGs; and (5) at least some detachments where there was an obviously high threat of<br />

counter intelligence activity (e.g., Eastern Bloc countries).<br />

The first round of site visits included Vienna, Prague, Belgrade, and Athens. In the<br />

interviews at each detachment, the draft checklist was discussed, item by item, to address<br />

the following issues:<br />

(1) the appropriateness and clarity of the wording;<br />

(2) the extent to which each behavior did, in fact, indicate a potential personnel problem;<br />

(3) the comprehensiveness of the list of behaviors, i.e., whether there were any behavior<br />

indicators that we had overlooked; and<br />

(4) the response format that should be used for the checklist.<br />

524


Based on the feedback received from the first round of site visits, a specific response<br />

format was selected, and a number of revisions were made to the checklist, including<br />

specific wording changes to increase clarity or applicability, the addition of several<br />

behaviors and the deletion of a few, and the combining of two categories that were seen<br />

as overlapping. This draft was reviewed by MSG Battalion personnel, including the<br />

MSG School Instructors/Advisors, and the second round of site visits was scheduled.<br />

There were six detachments visited in the second field review, one in the Middle East,<br />

four in SubSaharan Africa, and one in Western Europe. The same format was followed<br />

for these site visits in terms of the individuals interviewed and the topics covered. There<br />

were several more suggestions for additions and deletions, and a number of further<br />

wording changes recommended. The checklist was again revised, based on these<br />

recommendations, and was again reviewed by MSG Battalion personnel. The final set of<br />

categories were entitled:<br />

A. Job Performance E. Social Behavior<br />

B. Liberty Behavior z F. Emotional Behavior<br />

C. Drinking Behavior G. Money-Related Behavior<br />

D. Personal Relations/<strong>Association</strong>s H. Physical Health and Appearance<br />

Each category had a list of relevant behaviors, an Overall Rating Scale, and a space to<br />

write comments related to that category of behavior. There were four response options<br />

for each behavior: “Definitely Yes”, “Yes Somewhat”, “Definitely No”, and “Not Relevant”.<br />

Every “Yes” response required a written comment in the space provided for that<br />

category. The Overall Rating Scale for each category was a seven-point scale, where the<br />

lowest rating indicated that there were “Definite Problems”, and the highest rating indicated<br />

that the MSG’s “Behavior [was] Always Exemplary”.<br />

Since a number of the behaviors in the checklist were most appropriate or most critical<br />

for countries with a high threat of counter intelligence activity (e.g., Eastern Bloc countries),<br />

these behaviors were identified as such. Examples are: behaviors related to “fraternization”<br />

and behaviors related to using the “buddy system” whenever leaving the<br />

Embassy compound.<br />

Trial Usage and Evaluation. As an additional check on the readiness of the Behavior<br />

Indicators Checklist, Detachment Commanders were asked to use it for several months<br />

on a “For Research Only” basis. Commanders were briefed on the purpose of the checklist<br />

and were instructed to fill one out for each MSG in their detachment, after the MSG<br />

had been with the detachment for 90 days. They were further instructed to mail completed<br />

checklists directly to the researchers.<br />

A total of 792 completed checklists were received. These were reviewed for response<br />

errors, e.g., checking two response options for one behavior; and for illogical patterns of<br />

responding, e.g., many negative behaviors checked in a category, with a very high overall<br />

rating for that category. Additionally, all written comments were reviewed. Based on<br />

these investigations, there did not appear to be any problems with the response format or<br />

with the overall clarity and understandability of the checklist.<br />

There was also an attempt made to gather some other criterion data for the MSGs for<br />

whom we had completed checklists, to see if the patterns of response on the checklist<br />

made at least intuitive sense when compared to another performance/reliability measure.<br />

525


_II_ -..-.- -_- -..__.. -. ..~ ..---- _ . ..-- --. .~<br />

The most logical criterion data for this purpose were the available records of Non Judicial<br />

Punishments (NJP) and Reliefs for Cause (RFC). The base rates for these criteria,<br />

however, are so low that there were very few cases (N=40) where we had both a completed<br />

checklist and a record of either NJP or RFC. All 40 “matches” were with NJPs;<br />

there were no matches with RFC. In over half of these 40 cases, the NJPpredafed the<br />

completion of the checklist, so they were of no use in determining if the checklist could<br />

predict personnel problems. Checklists for the remaining few matches were examined<br />

and the response patterns did indeed seem to indicate that behavior problems were detected<br />

prior to the incident that incurred the Non Judicial Punishment.<br />

Near the end of the “For Research Only” usage period, a questionnaire was sent to all<br />

140 detachments, asking for an evaluation of the checklist and of the User’s Guide that<br />

accompanied it. Subjects covered by this questionnaire included:<br />

(1) Clarity of User’s Guide and checklist content;<br />

(2) Clarity of format (“user friendliness”);<br />

(3) Ease/Difficulty of making accurate ratings;<br />

(4) Time to complete the checklist/extent of administrative burden;<br />

(5) Completeness of checklist;<br />

(6) Usefulness of checklist; and<br />

(7) Recommendation for continued use.<br />

There were 106 questionnaires returned; a 76% return rate. The results can be summarized<br />

as follows. Both the User’s Guide and the checklist itself were reported to be clear,<br />

understandable, and “user friendly”. It was “fairly” to “very” easy to make accurate<br />

ratings. It took an average of 28 minutes to complete the checklist, and was considered<br />

to be a “reasonable” to “minimal” administrative burden (versus “excessive”). The list of<br />

behavior indicators on the checklist was considered to be “very complete“, and it was<br />

reported to be “pretty useful” (this was the second to highest usefulness response option;<br />

the highest was “extremely useful”). Recommendations regarding continued use of the<br />

checklist were:<br />

Yes, as it stands 76<br />

Yes, with revisions 20<br />

No 7<br />

No response 3<br />

106<br />

Of the twenty Detachment Commanders that indicated “Yes, with revisions”, most did<br />

not make specific recommendations for revision. Those who did comment on this<br />

recommendation referred to more procedural revisions, rather than revisions to the<br />

checklist items (e.g., use the checklist as a formal Counseling Sheet).<br />

Final Implementation of Checklist<br />

The final draft of the CVAL Behavior Indicators Checklist is now ready for implementation.<br />

An outline of the guidelines recommended for its use follows:<br />

526


(1) The keynote in interpreting the CVAL checklist is to look for behavioral change over<br />

time; to look for patterns that are out of character for that individual. For example, if<br />

a Marine is typically ,fairly quiet, then it should be of little concern that he doesn’t<br />

engage in a lot of casual conversation with his fellow Marines. If, on the other hand,<br />

a Marine is usually very outgoing and talkative, and he suddenly “goes quiet”, there<br />

may be a problem.<br />

(2) Not all behaviors on the checklist are particularly damning in and of themselves.<br />

Although there are no items on the checklist that represent perfectly healthy behavior<br />

for someone in the MSG position, there may be a reasonable explanation for an MSG<br />

exhibiting a particular behavior. Virtually every behavior on the checklist, however,<br />

should motivate the Detachment Commander to ask “Why?“. If there is no apparent<br />

reason for the behavior, attempts should be made to find out what the trouble is, for<br />

example, by observing the MSG more closely, or talking with him about the behavior.<br />

(3 ) The severity of some of the checklist behaviors depends significantly upon detachment<br />

location, For example, there are obvious differences in the implications some<br />

behaviors have for Eastern Bloc countries versus other countries.<br />

References<br />

Bosshardt, M. J., DuBois, D. A., & Crawford, K. (1990). Continuing assessment of<br />

cleared personnel in the militarv services: findings and recommendations (Institute<br />

Report No. 193). Minneapolis, MN: Personnel Decisions Research Institutes.<br />

DuBois, D. A., Bosshardt, M. J., & Crawford, K. (1990). Continuing assessment of<br />

cleared personnel in the military services: a conceptual analvsis and literature<br />

review (Institute Report No. 190). Minneapolis, MN: Personnel Decisions Research<br />

Institutes.<br />

Houston, J. S. (1989). Development of measures of Marine Securitv Guard performance<br />

and behavioral reliabilitv (Institute Report No. 171). Minneapolis, MN: Personnel<br />

Decisions Research Institutes.<br />

.<br />

Houston J. S., Wiskoff, M. F., 8z Sherman, F. (In press). A measure of behavioral reliability<br />

for Marine Security Guards: A final report (PERSEREC-SR-90-m).<br />

Monterey, CA: Defense Personnel Security Research and Education Center.<br />

Parker, J. P., Wiskoff, M. F., McDaniel, M. A., Zimmerman, R. A., & Sherman, F.<br />

(1989). Development of the Marine Security Guard Life Experiences Questionnaire<br />

(PERSEREC-SR-89408). Monterey, CA: Defense Personnel Security Research<br />

and Education Center.<br />

Wiskoff, M. F., Parker, J. P., Zimmerman, R. A., 8z Sherman, F. (1989). Predicting<br />

school and job performance of Marine Security Guards (PERSEREC-SR-89-013).<br />

Monterey, CA: Defense Personnel Security Research and Education Center.<br />

527


.<br />

SYMPOSIUM: JOB PERFORMANCE TESTING FOR ENLISTED PERSONNEL<br />

J. H. Harris (Chair), Charlotte H. Campbell,<br />

and Roy C. Campbell<br />

NO ABSTRACT RECEIVED<br />

528


NAVY ': HANDS-ON AND KNOWLEDGE TESTS FOR THE NAVY RADIOMAN<br />

Earl L. Doyle and Roy C. Campbell<br />

Human Resources Research Organization<br />

Introduction<br />

The Navy approach to the Job Performance Measurement Project focussed on<br />

the development of benchmark hands-on job proficiency tests which would, in<br />

turn, guide the development of written task-specific tests and written general<br />

knowledge tests that could be used as substitute measures of job performance,<br />

One of the jobs selected for this effort was the entry level Radioman (RM).<br />

These individuals qualify for their rating by graduating from the Navy Class A<br />

Radioman school at San Diego, California. After qualification they typically<br />

serve in one of two types of&facilities--either a shore-based installation or<br />

on board ship,<br />

This paper will review the major steps in development, the highlights of<br />

field test administration, and the principal findings of this research.<br />

Hands-On Tests<br />

Test Development<br />

Tasks to be tested were selected by a panel of experts consisting<br />

primarily of Senior Radiomen from the Navy Class A Radioman School (Lammlein,<br />

1987). Twenty-two tasks were initially identified. Project test developers,<br />

working with Radioman School instructional staff, integrated those tasks that<br />

are normally closely associated when performed on the job. This resulted in<br />

the development of the 14 tests shown in Table 1.<br />

Table 1<br />

Radioman Tasks for Hands-On Tests<br />

*Act as a Broadcast Operator<br />

*File Messages<br />

Change Paper/Ribbons on Teletype<br />

Establish System - November<br />

Perform Maintenance on Receiver<br />

*Prepare Message - DO173<br />

*Type/Format/Edit Message<br />

*Indicates product scored test.<br />

*Log Incoming Messages<br />

*Manually Route Messages<br />

Establish System - Golf<br />

*Inventory Classified Documents<br />

Perform Maintenance on Transmitter<br />

*Verify Outgoing Message<br />

*Prioritize Outgoing Messages by<br />

Precedence and Time<br />

The developed tests were based on an analysis of the individual and<br />

component tasks and consisted of dichotomously scored (GO/NO-GO) performance<br />

measures corresponding to steps done or characteristics of products produced.<br />

529


The performance tests utilized product scoring wherever a product was produced<br />

as a complete or partial result of the performance. Where feasible, product<br />

scoring is desirable because, correctly administered, it can enhance<br />

reliability. The nine tests that utilized at least partial product scoring<br />

are identified as such in Table 1.<br />

In addition to the scoresheets, developers prepared equipment setup<br />

instructions, instructions to the examinees, and scoring instructions. The<br />

entire test was designed to be administered at a single station using either<br />

an actual or a simulated ship's radio shack. Although the 14 tests were<br />

independent, they were operationally interconnected so they fit logically and<br />

sequentially into the test situation and location.<br />

Written Tests<br />

Written tests were developed that corresponded to the 22 tasks covered<br />

in the hands-on tests. Three features characterized these tests:<br />

. The tests were performance or performance-based. Items were<br />

based on either performing the same steps required in the<br />

hands-on test or in answering a question of how a step is done.<br />

� The tests were founded on performance errors. To insure items<br />

were performance oriented, the causes of error in performance<br />

were identified. Error was identified as having four origins:<br />

The Radioman did not know where to perform (location), did not<br />

know when to perform (sequence), did not know what the product<br />

of correct performance was (recognition), or did not know how<br />

to perform (technique).<br />

� The tests provided likely behavioral alternatives. Incorrect<br />

alternatives were based on likely errors that were possible and<br />

do occur on the job. Incorrect alternatives also had to be<br />

wrong, not merely less desirable than the correct alternative.<br />

The development result was an 87 item test in a multiple choice format<br />

that was organized into 11 topical, functional task areas that generally<br />

corresponded to the 14 hands-on test areas. (Several of the hands-on test<br />

areas that needed to be treated separately for administrative and equipment<br />

set-up requirements were combined for the written test, and one written test<br />

task area did not survive validation.) These 11 written test areas were<br />

organized so they could be administered and analyzed independently.<br />

General Knowledae Test<br />

The third area of RM testing was a written general knowledge test. Like<br />

the written performance test, this was a multiple choice test and was based on<br />

the same tasks that generated the hands-on tests. The difference between the<br />

two written tests was that the written performance test was specifically<br />

designed to measure performance while the general knowledge test measured the<br />

application of knowledge to the task subject--which may not necessarily<br />

reflect performance. For example, the written performance test might describe<br />

a situation and ask what EMCOM condition should be imposed under those<br />

circumstances; the general knowledge test might ask what EMCOM is.<br />

530


The general knowledge test consisted of 98 items. It was not separated<br />

by task or functional area, and in administration and analysis was treated as<br />

a single test.<br />

Test Administration<br />

The field tests were administered to 61 Radiomen, all of whom were<br />

graduates of the Class A Radioman School, were in paygrades E-2, E-3, and E-4,<br />

and had graduated from the School between 1 and 59 months prior to testing.<br />

(Of the tested population, 79% were in paygrade E-3 and 60% were in the 12<br />

months to 35 months experience window.) Twenty-eight of the 61 sailors tested<br />

were assigned to shore installations at the time of testing and 33 were aboard<br />

ships.<br />

<strong>Testing</strong> was conducted at two locations and about a month apart. <strong>Testing</strong><br />

lasted for 8 hours for each 'examinee and the three components of the test were<br />

sequentially counterbalanced. Five hands-on scorers were used. All scorers<br />

were project staff and had received extensive task/test training and<br />

calibration. Each Radioman was scored independently by at least two scorers<br />

for each hands-on test.<br />

Field Test Results<br />

Although a wide variety of analyses were conducted (Ford, Doyle,<br />

Schultz, & Hoffman, 1987), this paper will focus on four main areas of<br />

interest. Specifically:<br />

� Interrater reliability of the hands-on tests.<br />

� Internal consistency within test methods.<br />

� Intercorrelations among test methods.<br />

� Assignment effect (ship vs. shore).<br />

Interrater Reliabilitvsf the Hands-On Tests<br />

Interrater reliability estimates were computed from a generalizability<br />

theory in which absolute generalizability coefficients were produced (SAS,<br />

1982; Brennan, Jarjoura & Deaton, 1980). Generalizability estimates were<br />

obtained as if only one rater score were produced and for an average of the<br />

two raters, as shown in Table 2.<br />

The reliabilities are exceptionally high. This is attributed to the<br />

influence of the firm control over the scorers that was possible because they<br />

were members of project staff and, secondly, due to the high incidence of<br />

product scoring among the tested tasks.<br />

Internal Consistencv Within Test Methods<br />

Intertask correlations were conducted for the hands-on, written, and<br />

general knowledge tests (the general knowledge test was analyzed for interitem<br />

correlations since it was treated as a single test) and are presented in<br />

Table 3. The obtained coefficients demonstrate acceptable levels of internal<br />

consistency.<br />

531


_--. _-...- __--- . . . . .__.. -. - _- __<br />

Table 2<br />

Generalizability Coefficients for Hands-On Tests<br />

Task One Rater Two Raters<br />

*Broadcast Operator 0.96 0.98<br />

*Log Messages 0.96 0.98<br />

*File Messages 0.95 0.98<br />

*Manually Route Messages 0.95 0.98<br />

Change Paper/Ribbons 0.60 0.75<br />

Establish System - Golf 0.91 0.95<br />

Establish System - November 0.94 0.97<br />

*Inventory Classified Documents 0.69 0.82<br />

Preventive Maintenance - Receiver 0.93 0.96<br />

Preventive Maintenance - Transmitter 0.95 0.98<br />

*Prepare Message DD173 0.98 0.99<br />

*Prioritize Outgoing Messages 0.90 0.95<br />

*Type/Format/Edit 0.96 0.98<br />

*Verify Outgoing Messages 0.97 0.98<br />

*Indicates primarily product scored tests.<br />

Table 3<br />

Intertask/Item Correlations by Test Component<br />

Component Correlation<br />

Hands-On 0.89<br />

Written Performance 0.74<br />

General Knowledge 0.71<br />

Intercorrelations Amona Test Methods<br />

Correlations, particularly between hands-on and written performance<br />

tests, are important because of the possibility of substituting written tests<br />

for resource-demanding hands-on tests. The correlations between written and<br />

hands-on task tests are shown in Table 4 and the overall correlations between<br />

test methods is shown in Table 5.<br />

532


I<br />

Table 4<br />

Correlations Between Written Tests and Hands-On Tests<br />

Written Tests<br />

Broadcast Operator<br />

Maintain Comm Center File<br />

Manually Route Messages<br />

Establish Systems - Golf/November<br />

Inventory Classified Documents<br />

Preventive Maintenance - Receiver<br />

Preventive Maintenance - Transmitter<br />

Verify Outgoing Message ,<br />

Prioritize Messages<br />

Type/Format/Edit<br />

Prepare Message DD173<br />

Correlation<br />

.228<br />

.370** &<br />

.282* �<br />

422**<br />

.596** &<br />

.lOl - 557**<br />

.523**<br />

.234*<br />

.299*<br />

.375**<br />

.093<br />

.447**<br />

Note. Double correlation figures indicate a single written test covered two<br />

hands-on tests.<br />

*Significance: pc.05. **Significance: PC.01<br />

Table 5<br />

Correlations Among Scores by Test Method<br />

Test Method Hands-On Written Performance General Knowledge<br />

Hands-On<br />

Written Performance<br />

General Knowledge<br />

*Significance: px.01<br />

.71* .61*<br />

.71* .68"<br />

.61* .68*<br />

These correlations are very high. In a previous study (Rumsey, Osborn,<br />

&,Ford 1985) the authors looked at correlations between hands-on and written<br />

tests for 28 occupations in which the overall correlation for the 28 jobs is<br />

.41. Selecting the eight occupations that are similar to the Radioman, the<br />

hands-on--written correlation is .45, and for the military job most like the<br />

Radioman's --the Army Radio-Teletype Operator--the correlation is .37. Again,<br />

much of the notable results for the Radioman in this area is believed to be<br />

directly a result of the high rater reliability performance.<br />

533


I<br />

Assianment Effect (Ship vs. Shore)<br />

A comparison of the performance of Radiomen on all test methods revealed<br />

marked differences depending on whethe** the sailors were shore-base;o;rt;;ipbased,<br />

with the ship-based examinees cl)nsistently scoring higher.<br />

hands-on tests, this difference was significant (at p


1<br />

Interrater Reliability as an Indicator of<br />

HOPT Quality Control EffeCtiVeAeSs<br />

Major P. J Exner, USMC<br />

HQ USMC<br />

Jennifer L. Crafts<br />

Daniel B. Felker<br />

Edmund C. Bowler<br />

American Institutes for Research<br />

Paul W. Mayberry<br />

Center for Naval Analyses<br />

The United States Marine Corps Job Performance Measurement<br />

Project is attempting to validate enlistment quality requirements<br />

against actual on-the-job requirements. Since there are nearly<br />

500 <strong>Military</strong> Occupation Specialties (MOSS), developing hands-on<br />

performance tests (HOPTS) for each MOS is impractical. Therefore<br />

the Marine Corps has elected to test relatively large numbers of<br />

Marines in a few critical MOSS in each of the four Armed Services<br />

Vocational Aptitude Battery composites used for classification.<br />

<strong>Testing</strong> began with the General Technical (GT) composite in<br />

1986-87 for the infantry occupational field. In 1989-90 tests for<br />

Mechanical Maintenance (MM) composite MOSS were developed and<br />

administered. In August, 1990, hands-on testing was completed on<br />

approximately 1900 Marine automotive and helicopter mechanics.<br />

Because of the many possible sources of error in the<br />

development and administration of HOPTS, quality control is<br />

critical at every step. Poor test design or execution can<br />

significantly reduce validities and diminish the value of the<br />

results. In a preliminary Marine Corps study Maier (1988)<br />

reported a large reduction in validities due to various errors.<br />

Such errors can include content, test design, test administrator<br />

(TA) training, environmental, temporal, and other effects. One<br />

indicator of possible problems is interrater reliability,.. or TA<br />

agreement.<br />

In this paper, we will'review the quality control<br />

measures used in MM testing and examine preliminary reliability<br />

results across task, test site, MOS, and time.<br />

A series of quality control measures were used to ensure the<br />

quality of hands-on performance data. They include: recruitment<br />

of former or retired Marines to serve as TAs; selection of TA<br />

applicants based on scores on structured interviews; standardized<br />

test site setup;- extensive and ongoing training of TAs; rotation<br />

This research was funded by Contract No. N00014-87-C-0001 and by<br />

subcontract CNA 4-89. ~11 statements expressed in this paper are<br />

those of the authors and do not necessarily reflect the official<br />

views or policies of the Department of the Navy or the U.S. Marine<br />

Corps.<br />

535


-<br />

t<br />

of TAs across tasks: shadow scoring: on-site data entry; and<br />

ongoing counselling of TAs.<br />

Recruiting of Former/Retired Marines as TAs<br />

We sought former or retired Marines to serve 'as TAs,<br />

preferably those with experience in the MOSS which were tested.<br />

This offered several advantages over using civilians or active<br />

duty Marines. Their Marine Corps background enabled them to<br />

relate better to the examinees and promoted a more realistic<br />

testing atmosphere. Also, using former rather than active duty<br />

Marines eliminated a possible bias of Staff Non-Commissioned<br />

Officers toward their troops. Former Marines would have no<br />

vested interest in seeing that "their" mechanics performed well.<br />

Selection of TAs Based on Structured Interviews<br />

All TA candidates were screened using a structured interview<br />

which evaluated their suitability in several categories.<br />

Applicants were questioned concerning their previous experience<br />

in six areas: performance of mechanical tasks; test<br />

administration; administrative duties: planning and organization:<br />

public speaking; and vehicle maintenance. For each dimension,<br />

applicants were evaluated using a three point scale indicating<br />

noI moderate, or high familiarity. There were more applicants<br />

at the East Coast test sites, but overall TA quality was high at<br />

all locations. West Coast TAs for helicopter testing tended to<br />

be less experienced former Marines than at all other locations.<br />

Standardized Test Site Set Un<br />

<strong>Testing</strong> was conducted at five test sites. There was one site<br />

for automotive testing on each coast, and a single test site for<br />

helicopter testing on the East Coast. Due to the wide separation<br />

of helicopter assets on the West Coast, it was necessary to set<br />

up two test sites there. To reduce site differences, the same<br />

people were involved in establishing the site requirements and<br />

setup procedures at all test sites for air or ground. Where more<br />

than one test site was set up simultaneously, individuals<br />

directing the set up had previous experience at another test<br />

site. Site directors at all sites were involved in the site<br />

requirements determination from beginning to end. Standardized<br />

aircraft/vehicle, test equipment, parts, tools, publications, and<br />

other requirements lists were prepared for all sites. Local<br />

variations in equipment brands, procedures, and facilities were<br />

carefully analyzed for their possible impact and eliminated or<br />

minimized across all sites.<br />

Extensive and Onqoing TA Traininq<br />

TAs underwent a thorough week-long training program. Most<br />

had served in the Marine Corps where training had been an<br />

integral part of their responsibilities for years. We stressed<br />

the requirement to avoid giving feedback to the examinee which<br />

536


might influence task performance. TAs were trained on how to<br />

perform each task they were to evaluate and practiced them under<br />

the supervision of active duty subject matter experts. This<br />

included role playing and deliberate errors on the part of the<br />

V'examineelt to check TA consistency and develop standardized<br />

scoring of irregular responses. Once test administration was<br />

begun, there were periodic review of steps with low interrater<br />

reliabilities, with retraining where necessary.<br />

Rotation of TAs Across Tasks<br />

TAs were trained in multiple tasks to allow them to rotate<br />

among test stations. This lessened the effect of boredom,<br />

provided a cross check on the standardization of scoring in each<br />

task, and reduced the impact of TA differences on scoring.<br />

Shadow Scorinq<br />

Perhaps the most important quality control procedure, shadow<br />

scoring involved independent evaluation of an individual's task<br />

performance by two TAs simultaneously. Shadow scorers were used<br />

to monitor TA performance and test reliability, and were<br />

systematically scheduled to capture interactions among testing<br />

order and individual TA characteristics.<br />

On-Site Data Entrv Trend Analvsis<br />

A Hands-On Score Entry System (HOSES) was developed to enter,<br />

verify, and report analyses-of collected data. Daily on-site<br />

data entry enhanced completeness of data and allowed for early<br />

identification of problems with the tests, TA consistency, and<br />

score drift over time. HOSES generated three reports which were<br />

used by site hands-on managers to improve scoring reliability.<br />

1. Data Entry Report. All data were entered twice. This report<br />

verified that there were no discrepancies between the two<br />

entries. It also reported any missing data so the information<br />

could be tracked down on the day of original testing. This<br />

greatly reduced the amount of missing data.<br />

2. The Detailed Discrepancy Report listed al'1 steps where<br />

primary and shadow scorers disagreed. It also gave percent<br />

disagreement for each task, and overall daily total by TA.<br />

3. The Summary Report presented cumulative historical summaries<br />

by TA and task. TA summaries showed leniency and reliability<br />

information for each task administered by the TA. Leniency was<br />

measured as a deviation from the mean percentage of "GOATS for<br />

all TAs on each task. Reliability indicated disagreement with<br />

all other TA's on each task. These were valuable in identifying<br />

individual TA problems. Since this report could be broken out by<br />

time, it also provided trend information. Task summaries showed<br />

percent "Go" and disagreement for each step. This helped focus<br />

on test effect problems, i.e. those common across all TAs.<br />

537


Hands-on managers used these reports extensively. Differences<br />

among TAs were discussed, and ambiguities in interpretation of<br />

scoring rules were resolved through discussion and, if required,<br />

additional training. Individual and group trends could be<br />

detected. Individual TA counselling focused on adherence to the<br />

original training standards and the definition and interpretation<br />

of scoreable steps. Hands-on managers avoided overemphasis<br />

on consistency to prevent artificially high levels of agreement.<br />

Interrater Reliability Results<br />

Interrater reliability, or TA agreement, can indicate the<br />

presence of several possible error sources: test design, time,<br />

t environmental, or other effects. Interrater reliability is the<br />

percentage agreement between primary and shadow scorers on<br />

individual task steps, It is computed by dividing the number of<br />

steps on which the primary and shadow scorer agreed by the total<br />

number they both graded, summed across all examinees and all<br />

tasks. It was calculated using all obsenrations where both<br />

primary and shadow step scores were available.<br />

Fig. 1: Agreement by task<br />

Age-t bctrrem ,nimwy md rhodow test odminirtrotor,<br />

b y time intcrvd<br />

Fig. 2: Agreement by Time<br />

Period<br />

Figure 1 shows scorer agreement across tasks for automotive<br />

mechanics. Agreement ranged from .873 to .971 indicating that<br />

TAs could reliably differentiate tlG~t' and "No Go" performance.<br />

The lowest reliabilities at both sites were on troubleshooting<br />

tasks, indicating some ambiguity in scoring the steps on those<br />

tasks. Three of the lowest four reliabilities occurred on tasks<br />

which were hard to observe because of confined spaces. The fact<br />

that the relative reliabilities among tasks were the same between<br />

sites also indicates a good training program and suggests that<br />

reliability differences were due to test effects.<br />

538


Figure 2 shows temporal effects at auto mechanic test sites.<br />

Site A experienced a slight drop in agreement in the middle time<br />

period. The decline in reliability was noted at the time, so<br />

counselling and retraining were conducted, resulting in an<br />

increase during the final period. Site B agreement increased<br />

during all three periods. The overall increase in agreement at<br />

both sites is natural given increasing familiarity of the TAs<br />

with the scoring standards over time. The fact that the increase<br />

is relatively small indicates that the initial training program<br />

prepared the TAS very well. Again, this points to test effects<br />

as a likely cause of the differences in reliabilities.<br />

This same trend carries over into the helicopter mechanic<br />

t reliabilities despite some differences in their HOPTs. Whereas<br />

all automotive mechanics were given the same HOPT, each<br />

helicopter mechanic MOS had its own test. Helicopter MOSS were<br />

tested sequentially, so temporal effects are evident across<br />

aircraft type, as shown in Figure 3. Since test order varied<br />

across site, increased reliability is not indicated left to right<br />

for both sites. Test order for Site C was CH-53A/D, CH-46,<br />

CH-53E, and UH/AH-1. Site D order was UH/AH-1, CH-46, and CH-<br />

53E. No CH-53A/D were tested at Site D. Taking this test order<br />

into account we see that agreement increased over time at both<br />

sites, except for the CH-46 at Site C.<br />

Agasmcnt between +-7wy md shodor ted odnhistrotorr b y oicroft<br />

Fig. 3: Agreement by Aircraft Fig. 4: Agreement by Interview<br />

Ratings<br />

The drop in CH-46 agreement is explainable in terms of<br />

variation in test conditions. At Site D, all examinees in 2<br />

particular MOS were tested on the same aircraft. At Site C, each<br />

unit set up its own aircraft, resulting in changing conditi


c<br />

mechanic among the TAs (the rest were from the other three<br />

aircraft). The reduced agreement may partly reflect this<br />

diminished commonality of experience among the TAs. Even so,<br />

over time, the continuing training program and skill transfer<br />

across aircraft resulted in overall increased reliability.<br />

Figure 4 plots agreement versus the initial TA interview<br />

ratings. The strongest correlation with agreement was for TA<br />

applicants who rated high in test administration, public<br />

speaking, and administrative experience. The negative effect of<br />

maintenance and mechanical familiarity may indicate a bias<br />

resulting from experience. Yet in all cases, reliabilities were<br />

acceptable. Interestingly, among all MM TAs, there was no<br />

significant difference in between TAs who had also senred on the<br />

infantry project several years earlier and the mechanics hired<br />

for this project.<br />

Conclusion<br />

The high reliabilities found in the preliminary analysis are<br />

encouraging. They indicate that the TA training program was<br />

sound, scoring was well standardized across sites, and that the<br />

HOPT steps were discrete, and consistently measurable. There<br />

were also no indicators. of any significant test effects or other<br />

systematic problems with the test that would preclude achieving<br />

the high validities obtained in the infantry study. Finally, the<br />

results have implications for HOPT Test Administrator selection.<br />

This analysis seems to indicate that such qualities as previous<br />

test administration experience and public speaking are more<br />

important than experience in the particular field being tested.<br />

Reference<br />

Maier, M. ,H. (1988). On the Need for Quality Control in<br />

Validation Research. Personnel Psvcholosv, 41, 497-502.<br />

540


ARMY: JOB PERFORMANCE MEASURES FOR NON-COMMISSIONED OFFICERS<br />

Charlotte H. Campbell and Roy C. Campbell<br />

Human Resources Research Organization<br />

The Army approach to criterion measurement for the JPM project focuses<br />

on two stages in the enlisted person's service time: after about two years in<br />

service, and after three to five years, as a non-commissioned officer (NCO,<br />

corporal E4 or sergeant Es). In this presentation, we report on the job<br />

analysis, development of written test, job sample test, and rating scale<br />

instruments, and testing results for NCOs. The analysis and testing were<br />

conducted on nine jobs, or <strong>Military</strong> Occupational Specialties (MOS), listed in<br />

Table 1.<br />

Table 1<br />

Army <strong>Military</strong> Occupational Specialties (MOS)<br />

11B<br />

13B<br />

19E<br />

31c<br />

63B<br />

71L<br />

aaM<br />

91B<br />

95B<br />

Infantryman<br />

Cannon Crewmember<br />

Armor Crewman<br />

Single Channel Radio Operator<br />

Light Wheel Vehicle Mechanic<br />

Administrative Specialist<br />

Motor Transport Operator<br />

Medical NC0<br />

<strong>Military</strong> Specialist<br />

-Job Analysis<br />

For-each M?S,-a job analysis was perforped by aggregat - . .<br />

ing al 1 availab le<br />

information to define a population of tasks.' Squrces ot Job- and _ - _ task- . -<br />

analytic information included Soldier's Manuals (both MOS-specific and Common<br />

Task), Army Occupational Survey Program data on performance frequency, data on<br />

IJob analysis details may be found in J. P. Campbell (Ed.), Improvina the<br />

Selection, Classification, and Utilization of Armv Enlisted Personnel: Annual<br />

Report, 1987 Fiscal Year (HumRRO Report IR-PRD-88-18), October 1987.<br />

This research was funded by the Army Research Institute on two projects: Improvirrq<br />

Classification, and Utilization of Army Enlisted Personnel (Project A) (Project No. MDA903-82-C-0531), and<br />

Building the Career Force (Project No. MDA903-89-C-0202). Project Director is J. H. Harris, and Principai<br />

Scientist is J. C. Campbell, both of Human Resources Research Organization. Contracting Officer's Technica!<br />

Representative is Dr. M. 6. Rumsey, who is the Chief of the Selection and Classification Technical Area of<br />

the Army Research Institute for the Behavioral and Social Sciences. The views expressed herein are those of<br />

the authors and do not necessarily represent the official position of the Army Research Institute or the<br />

Department of the Army.<br />

541


,<br />

frequency and importance of supervisory tasks from a special administration of<br />

the Leader Requirements Survey, collection and content analysis of critical<br />

incidents, and interviews with MOS incumbents.<br />

The resulting job domain included supervisory, common, and MOS-specific<br />

tasks and behaviors. Army policy designates certain tasks as being part of<br />

the job for corporals and sergeants; tasks at lower skill levels were included<br />

in the domain because of the Army's policy that soldiers are responsible for<br />

such tasks, and tasks at higher skill levels were included if there was<br />

evidence that soldiers in fact performed such tasks.<br />

Instrument Development 2<br />

Information collected using the critical incident methodology was used<br />

to construct a series of rating scales for each MOS, as well as scales that<br />

were not specific to any one MOS but rather reflected Army-wide behaviors.<br />

These scales were used to measure behaviors on all three components of the job<br />

domain -- supervisory, common, and MOS-specific -- by means of ratings<br />

collected from soldiers' supervisors. The 7-point rating scales were<br />

behaviorally-anchored, that is, short descriptions of behaviors that<br />

characterize the low, middle, and high points of each of the scales were<br />

provided. Army-wide supervisory behaviors (e.g., Monitoring, Organizing<br />

Missions and Operations) were addressed by 12 of the scales, 9 scales were<br />

Army-wide and non-supervisory (or common, e.g., Following Regulations and<br />

Orders, Physical Fitness), and for each MOS there were between 7 and 14 MOSspecific<br />

dimensions.<br />

For the task-based information, judgments were obtained from subject<br />

matter experts (SMES) on several task parameters, including performance<br />

difficulty, performance variability, and criticality. The task list for each<br />

MOS was clustered into functional areas, and a second panel of SMEs selected<br />

proportional systematic samples from the task population. These task samples<br />

were subjected to formal reviews by the proponent.<br />

At this point, the task-based instrument development process diverged<br />

into four separate approaches: Job knowledge (written) tests, hands-on job<br />

sample tests, role-play simulations, and written situational judgment tests.<br />

Multiple-choice job knowledge test items were constructed for all of the MOSspecific<br />

and common tasks selected for each MOS. These tests are<br />

characterized by their orientation on task performance and by the extensive<br />

use of graphics and job-relevant contextual information. For each MOS, a onehour<br />

test of both common and MOS-specific tasks was prepared, comprising<br />

approximately 120 items. Two scores were constructed, for common tasks and<br />

*Details of instrument development are presented in J. P. Campbell (Ed.),<br />

Buildina the Career Force, First Year Report (in preparation). Rating scales<br />

development and Situational Judgment Test development were directed by W. C.<br />

Borman and M. Hanson of Personnel Designs Research Institute, Inc. Role-play<br />

development was directed by E. D. Pulakos of Human Resources Research<br />

Organization and D. Whetzel of the American Institutes for Research.<br />

Development of hands-on and job knowledge tests was directed by C. H. Campbell<br />

and R. C. Campbell of Human Resources Research Organization, and D. C. Felker<br />

of the American Institutes for Research.<br />

542


for MOS-specific tasks, as the percent of items answered correctly on tasks in<br />

each area.<br />

Hands-on job sample tests were developed to test performance on 8-14 of<br />

the tasks selected for each MOS. The tasks that were allocated to the handson<br />

component included, by design, both common and MOS-specific tasks, at the<br />

target skill level as well as lower and higher skill levels, and from as many<br />

functional areas as was feasible for testing. Scores were constructed as the<br />

percent of steps performed correctly for a given task, averaged across the<br />

common or MOS-specific tasks.<br />

Examination of the supervisory tasks selected for each MOS revealed a<br />

common structure of three areas of supervisory behaviors across the nine MOS:<br />

Personal Counseling, Disciplinary Counseling, and Training. To measure these<br />

three aspects of the job, simulation exercises (role-plays) were developed.<br />

The role of a private was played by a trained civilian test scorer (three<br />

different scorers performed the three roles for a given soldier). At the<br />

conclusion of a role-play, the actor/scorer rated the soldier on 12-18 aspects<br />

of behavior during the exercise. Each aspect was rated by means of 3-point<br />

behaviorally-anchored rating scale, and an overall score was computed as the<br />

average across the three role-plays of the mean rating on items within the<br />

role-play.<br />

The written situational judgment tests were designed to tap those areas<br />

of supervisory behaviors that could not be included in the role-plays. They<br />

were intended to evaluate the effectiveness of the NCO's judgments about what<br />

to do in difficult supervisory situations, and were meant to tap the cognitive<br />

aspects of first-line supervisory practice in the Army. The test contained 35<br />

items, consisting of a situation and 3-5 alternative courses of action;<br />

soldiers indicated which response alternatives they believed to be the most<br />

and the least effective. Effectiveness weights were assigned to each response<br />

of each item with the assistance of the Sergeants-Major Academy, and item<br />

scores were computed as the weight of the soldier's "Most Effective" response<br />

minus the weight of the soldier's "Least Effective" response. The total score<br />

was the mean of the item scores.<br />

Figure 1 portrays the test mode (written, job sample, and ratings) by<br />

job component (supervisory, common task, and MOS-specific) coverage among the<br />

testing instruments.<br />

Test Administration and Results<br />

Data were collected from 1009 soldiers and their supervisors (rating<br />

scales only) in the nine MOS at 13 Army posts CONUS and in Germany. The<br />

hands-on tests were administered by NC0 scorers under the supervision of<br />

trained civilian staff; all other instruments were administered by trained<br />

members of the project staff.<br />

Table 2 gives the basic statistical characteristics for each instrument,<br />

across the nine MOS. For every instrument, the mean scores are above the<br />

midpoint. However, there is no great evidence of skew in the data, and the<br />

reliability estimates are satisfactory.<br />

543


Test Mode Supervisory<br />

WRITTEN TESTS Situational Judgment Test<br />

(Mean of effectiveness<br />

weight for "M' responses<br />

minus effectiveness weight<br />

for "L" responses)<br />

-<br />

Job Components<br />

CommxT MOS-Specific<br />

Job Knowledge Tests Job Knowledge Tests<br />

of Comnon Tasks of MOS-Specific Tasks<br />

(Percent items correct) (Percent items correct)<br />

JOB SAMPLE TESTS Supervisory Role-Plays Hands-On Tests Hands-On Tests<br />

(Mean across role-plays of Cornnon Tasks of MOS-Specific Tasks<br />

of ratings on 3-point (Mean across tasks of (Mean across tasks of<br />

effective behavior scales) percent steps passed) percent steps passed)<br />

RATINGS Rating Scales - Army Wide Rating Scales - Army Wide Rating Scales -<br />

Supervisory Dimensions Non-Supervisory Dimensions MOS-Specific Scales<br />

(Mean across dimensions (Mean across dimensions (Mean across dimensions<br />

of supervisor ratings on of supervisor ratings on of supervisor ratings on<br />

7-point rating scales) 7-point rating scales) 7-point rating scales)<br />

Figure 1. <strong>Testing</strong> instruments providing coverage of each job component, by<br />

test mode.<br />

Table 2<br />

Statistical Characteristics of Test Instruments Across Nine MOS<br />

Supervisory Comnon MOS-Specific<br />

Mean SD Rel. Mean SD Rel. Mean SD Rel.<br />

Situational Judgment Tests 1.37 .60 .75<br />

Job Knowledge Tests 65.4 12.5 .79 64.9 13.5 .73<br />

Supervisory Role-Plays 2.26 .42 .71<br />

Hands-On Tests 72.6 15.4 .46 69.4 19.5 .44<br />

Rating Scales - Army-Wide 4.49 1.06 .50 5.13 1.13 .48<br />

Rating Scales - MOS-Specific 5.19 0.97 .43<br />

Note. Situational judgment test results ranged from -.77 to 2.57 (thus the mean score of 1.37 is roughly<br />

equivalent to a score of 4.46 on a 7-point scale, with a standard deviation of 1.26): reliability estimate<br />

is split-half on items, corrected to test length.<br />

Job knowledge tests and hands-on test scores are proportions correct: reliability estimate for job knowledge<br />

tests is the median across MOS of a split-half on odd-even items, corrected to test length; the reliability<br />

estimate for hands-on tests is the median across MOS of the split-half on task scores, corrected to number<br />

of tasks.<br />

Ratings were made on a 7-point scale, where a 1 represents poor performance; reliability estimates are onerater<br />

reliabilities across dimensions, using the median across MOS for MOS-specific ratings.<br />

Role-play ratings were made on a 3-point scale, where a 1 represents less effective supervision; reliability<br />

eStim&eS are the median one-rater reliability across items, averaged across the three role-plays.<br />

544


Table 3 shows the intercorrelations among the instruments across the<br />

nine MOS. For the rating scales and, to a lesser degree, for the written<br />

tests, there are high correlations across the different job components. This<br />

may indicate that the test mode itself is responsible for much of the observed<br />

variance. Because the raters for a soldier were the same individuals for all<br />

three sets of scales, we would expect the results to be correlated; likewise,<br />

we expect scores on written multiple-choice tests to be correlated simply<br />

because of the cognitive processing burden imposed by the written material.<br />

The job sample tests, on the other hand, are less affected by the similarity<br />

of method, not surprising in view of the fact that nearly every job sample<br />

exercise (hands-on task test or role-play situation) is conducted and scored<br />

by a different administrator.<br />

Table 3<br />

Intercorrelations (Uncorrected) Among Test Modes (Written, Job Sample, and<br />

Rating Scales) and Job Components (Supervisory, Common Task, and MOS-Specific)<br />

WRITTEN MODE<br />

Supervisory (Situational Test)<br />

Cornnon Task (Job Knowledge Test)<br />

MOS-Specific (Job Knowledge Test)<br />

JOB SAMPLE MODE<br />

Supervisory (Role-Plays)<br />

Comnon Task (Hands-On Test)<br />

MOS-Specific (Hands-On Test)<br />

RATINGS MODE<br />

Supervisory (Army-Wide Ratings)<br />

Comn Task (Army-Wide Ratings)<br />

MOS-Specific (MOS Ratings)<br />

Written Mode Job Sample Mode<br />

Sup. Comn. MOS Sup. Comn. MOS<br />

1.00<br />

.40 1.00<br />

.34 .48 1.00<br />

.12 .19 .13 1.00<br />

.09 .30 .20 .I0 1.00<br />

.ll .23 .42 .06 .17 1.00<br />

.I7 .13 .13 -10 .08 .08 I..00<br />

Ratinqs Mode<br />

Sup. Comn. MOS<br />

.13 -12 .09 .07 -06 .09 .71 1.00<br />

.ll .15 .12 .05 .07 -09 .74 .64 1.00<br />

The correlations between different test modes measuring the same job<br />

components are highlighted in the table. The correlations between the two<br />

task-based instruments (job knowledge tests and hands-on tests) are relatively<br />

high even across the job components of common tasks and MOS-specific tasks.<br />

At the same time, the cognitive aspects of supervisory activities seem to be<br />

related to observed supervisory skill (ratings) to a greater degree than to<br />

job samples of supervisory behaviors. It appears that, for common and MOS<br />

tasks, knowins how to perform and beinq able to perform are more highly<br />

related than either of those is to actuallv performinq on the job. However,<br />

545


for the less easily defined and analyzed supervisory performance, knowinq<br />

effective wavs to supervise and beinq rated as a qood supervisor are more<br />

highly related than either of those is to demonstratinq supervisory skills on<br />

a role-play.<br />

Discussion<br />

Hands-on job sample tests and written job knowledge tests are frequently<br />

used in military performance measurement situations. Whenever we have welldefined<br />

tasks, with unequivocal task analyses that include the initiating cues<br />

and performance standards and that permit the identification of correct and<br />

incorrect actions, we can construct job knowledge tests or job sample tests.<br />

(Whether or not the tests are administrable within available or reasonable<br />

resources is another issue.) These types of tests are widely used because<br />

what they measure -- declarative and procedural knowledge, ability to<br />

perform -- is fairly well-understood. However, the assessment of "typical"<br />

performance (as opposed to ability or knowledge) is more difficult, and the<br />

use of anchored rating scales provides us a method that is arguably less<br />

precise -- but so is the target behavior less precise. Measurement of<br />

supervisory skills has long been regarded as difficult at best. Like<br />

"leadership," these skills are often referred to as "intangible," as though we<br />

are unsure of their existence, The situational judgment tests and the roleplays<br />

are, however, measuring something, and with a respectable degree of<br />

reliability. Continued attention to the development of these instruments, and<br />

to ways of assessing their dimensionality, should yield useful information to<br />

the military testing community.<br />

References<br />

Campbell, J. P., Ed. (October 1987). Improvinq the Selection, Classification,<br />

and Utilization of Army Enlisted Personnel: Annual Report, 1987 Fiscal<br />

Year (HumRRO Report IR-PRD-88-18). Alexandria, VA: Human Resources<br />

Research Organization.<br />

Campbell, J. P., Ed. (in preparation), Buildinq the Career Force: First<br />

Year Report. Alexandria, VA: Human Resources Research Organization.<br />

546


The USAF Occupational Measurement Squadron:<br />

Its Organization, Products, and Impact<br />

Joan T. Brooks<br />

William J. Carle<br />

Johnnie C. Harris<br />

Paul P. Stanley II<br />

Joseph S. Tartell<br />

USAF Occupational Measurement Squadron<br />

The- USAF Occupational Measurement Squadron (USAFOMS) represents the operational<br />

application of two major thrusts in industrial psychology in the Air<br />

Force : personnel testing and occupational analysis. Each of the USAFOMS’s<br />

four major programs reflects in its own way how these important technologies,<br />

which began as research efforts, have been applied to real-world problems to<br />

support Air Force mission accomplishment. Out of personnel testing grew the<br />

USAFOMS’s Occupational Test Development Program and the Professional Development<br />

Program. Out of occupational analysis grew the Occupational Analysis<br />

Program and the Training Development Services Program.<br />

A Brief History of the Squadron<br />

In 1970, the implementation of the Weighted Airman Promotion System (WAPS)<br />

triggered the establishment of a new organization within the headquarters of<br />

the Air Training Command (ATC), with the cryptic title of “Detachment 17.”<br />

Detachment 17 consisted of two branches, one responsible for test development,<br />

the other for occupational analysis. In 1974, the Air Force-wide<br />

impact of this organization's missions was recognized when it became the USAF<br />

Occupational Measurement Center. In October 1990, the unit, which is located<br />

at Randolph Air Force Base, Texas, was renamed the USAF Occupational Measurement<br />

Squadron. The USAFOMS Commander also sits on the staff of the Deputy<br />

Chief of Staff for Technical Training as the Director of Occupational Mea-<br />

surement.<br />

The Occupational Test Development Proqram<br />

In the 1950s and 196Os, pencil-and-paper tests were mainly used in training<br />

programs, to assess trainee progress. The implementation of WAPS, however,<br />

made tests a critical factor in enlisted career progression.<br />

The idea of WAPS was to take the mystery out of the promotion system by<br />

making every aspect visible to those competing for promotion. Under WAPS,<br />

airmen compete for promotion to the ranks of staff sergeant (E-51 through<br />

master sergeant (E-7) with other airmen in the same Air Force specialty (AFS)<br />

on the basis of a single score. This single WAPS score is the sum of six<br />

‘component measures (See Table 11, with USAFOMS tests accounting for up to 44%<br />

of the total. Most airmen take two tests: the Specialty Knowledge Test<br />

(SKI) measures knowledge of the Air Force specialty and the Promotion Fitness<br />

Examination (PFE) tests knowledge of general military subjects. Because the<br />

other, non-test factors typically do little to disperse promotion competi-<br />

tors, the SKT and PFE are often the deciding factor in determining who gets<br />

promoted.<br />

547


Table 1. Weighted Airman Promotion System Factors<br />

FACTOR<br />

SKT Score<br />

PFE Score<br />

Time in Service<br />

Time in Grade<br />

Enlisted Performance Ratings<br />

Awards and Decorations<br />

MAXIMUM PERCENTAGE<br />

POINTS VALUE<br />

100 22%<br />

100 22%<br />

40 9%<br />

60 13%<br />

135 29%<br />

25 5%<br />

TOTAL 460 100%<br />

Each promotion test is revised annually in order to prevent compromise and<br />

keep abreast of technological or procedural changes. The tests are con-<br />

structed using the content validity strategy of test development. Three<br />

sources of information are the foundation of content validity for the tests:<br />

the training standard, which lists the specialty's common duties and tasks;<br />

the occupational analysis data provided by USAFOMS’s own Occupational Analysis<br />

Program, which show the relative importance of tasks performed by job<br />

incumbents; and, most important, the experience and knowledge of the sub-<br />

ject-matter experts (SMEs) brought in to write the tests.<br />

The SMEs are senior NCOs selected from throughout the Air Force on the basis<br />

of their job experience in their respective career fields. "Tests Written by<br />

Airmen for Airmen" is a slogan which accurately sums up the USAFOMS test<br />

development philosophy, because these senior NCOs are the heart of the testwriting<br />

process. They provide the technical expertise and USAFOMS psychologists<br />

provide the psychometric expertise to produce job-relevant and statis-<br />

tically sound tests.<br />

While at USAFOMS, each group of SMEs is assigned a test psychologist to lead<br />

them through the test development process. A quality control psychologist<br />

acts as an additional set of eyes, performing exhaustive and minute scrutiny<br />

of all team output. A group of eight test management psychologists oversees<br />

the test development effort as a whole: identifying testing requirements for<br />

their assigned career fields, closely monitoring all events which may affect<br />

testing, ensuring that qualified SMEs are selected, providing guidance to<br />

test writers, and ultimately assuming overall responsibility for the tests<br />

developed.<br />

SMEs spend from 2 to 6 weeks at USAFOMS, depending on the type of test and<br />

the extent of test revision involved. During this time, their questions are<br />

thoroughly researched and reviewed. Each team member has veto power over<br />

each test item. After the SMEs leave, each test is subjected to an additional<br />

20 steps of quality control. The final product is a camera-ready test<br />

manuscript prepared with computerized photocomposition equipment. The manuscript<br />

is forwarded through the Air Force publications distribution system to<br />

be printed and disseminated worldwide through the network of Air Force test<br />

control officers.<br />

548


In addition to SKTs and PFEs, USAFOMS produces USAF Supervisory Examinations<br />

(USAFSEs) and Apprentice Knowledge Tests (AKTs). USAFSEs assess general<br />

supervisory and managerial knowledges and are used in the Senior NC0 Promotion<br />

Program, a board-based system used to make selections for promotion to<br />

the ranks of senior master sergeant (E-8) and chief master sergeant (E-9).<br />

AKTs measure the knowledge required for possession of the 3-skill level (also<br />

called apprentice level> of training. An airman with documented civilian<br />

experience in a specialty may be allowed to bypass resident technical training<br />

with a passing score on the AKT, thus saving the Air Force valuable<br />

training dollars.<br />

In 1989, 700 SMEs were sent TDY to USAFOMS to develop a total of 418 tests.<br />

The Professional Development Proqram<br />

This program, though not strictly an outgrowth of the field of industrial<br />

psychology like the others of USAFOMS, has had an important positive effect<br />

on acceptance of USAFOMS’s promotion tests. It is Air Force policy that<br />

promotion tests be developed entirely from references that will be available<br />

to all examinees for study. Before 1980, this was a probiem with USAFOMS’s<br />

most highly visible tests, the Promotion Fitness Examinations and USAF Supervisory<br />

Examinations. These tests were written from a variety of references<br />

which varied in quality and availability. The Professional Development<br />

Program was established to develop a single, high-quality reference upon<br />

which these critical promotion tests could be based.<br />

The reference which evolved was Air Force Pamphlet 50-34. Volume I of the<br />

pamphlet is now the sole source reference for airmen taking the Promotion<br />

Fitness Exam to compete for promotion to staff sergeant, technical sergeant,<br />

and master sergeant. Airmen competing for promotion to senior master ser-<br />

geant and chief master sergeant study both Volume I and Volume II in preparing<br />

to take the USAF Supervisory Exam.<br />

The Occupational Analysis Proqram<br />

In the early 196Os, research performed by the Air Force was to influence<br />

profoundly the field of industrial psychology. Occupational analysis had<br />

been around for many years in various forms, but it was the Comprehensive<br />

Occupational Analysis Data Analysis Programs (collectively called CODAP)<br />

developed by the Personnel Research Laboratory which made possible the study<br />

of jobs on the scale necessary to work with career fields the scope of those<br />

in the Air Force. In 1967, the Job Specialty Survey Division was formed to<br />

apply this technology in the operational setting. It was part of what was<br />

then called Lackland <strong>Military</strong> Training Squadron until Detachment 17 was<br />

formed in 1970.<br />

.People in the Occupational Analysis Program conduct surveys of AF personnel,<br />

both military and civilian, to learn what tasks they do regularly on the job.<br />

The Air Force uses the survey results for refining and maintaining occupational<br />

structures within a classi+ication system, for constructing enlisted<br />

promotion tests, for adjusting or establishing training programs, and for<br />

sustaining or modifying other Air Force personnel and research programs. The<br />

occupational survey process consists of six distinct phases, beginning with<br />

the receipt of a request for an occupational survey. Requests for surveys<br />

549


are reviewed by the Priorities Working Group (PWG). In addition to USAFOMS<br />

personnel, the PWG consists of representatives from the Air Force Deputy<br />

Chief of Staff for Personnel, the Air Force Human Resources Laboratory<br />

(AFHRL), the Air Force <strong>Military</strong> Personnel Center (AFMPC), and the ATC technical<br />

and medical training staffs. The PWG selects those specialties which<br />

will be surveyed and assigns relative priorities.<br />

The next step is the development of a job inventory. The job inventory<br />

consists of a comprehensive listing of tasks which may be performed in a<br />

particular occupational field. Inventory developers travel to operational<br />

bases as well as ATC technical training centers for exhaustive interviews<br />

with subject-matter experts. From these interviews, they compile the task<br />

listing and publish it along with background questions as the USAF Job Inventory<br />

for the occupational field under study.<br />

The job inventory is then administered to job incumbents, usually through the<br />

personnel office at each installation. The returned job inventory booklets<br />

undergo a quality control review to correct or eliminate those which have<br />

been improperly completed. Each booklet is reviewed for accuracy and com-<br />

pleteness. This careful quality control of the returned booklets ensures<br />

that the data received are accurate.<br />

Once the booklets are quality controlled, data processing personnel use an<br />

optical scanner to input task responses and background data from returned<br />

inventories into the computer. Computer programming personnel then apply<br />

CODAP programs to create job descriptions and other related products to aid<br />

in data analysis.<br />

Occupational analysts then spend considerable time analyzing the data and<br />

reporting significant trends and implications. USAFOMS publishes the find-<br />

ings and results of the analysis in the form of an Occupational Survey Report<br />

(OSR). The OSR and related data packages are made available to Air Staff,<br />

major commands (MAJCOMs), classification and training personnel, and other<br />

interested Air Force agencies.<br />

The critical final step in the occupational survey process involves working<br />

with the users to apply the data to their particular situation. During this<br />

step, the analyst introduces the user to the data products and gives specific<br />

guidance on how to use the data printouts in making decisions. Once the data<br />

have been analyzed and the OSR has been written and released, the data are<br />

used in a variety of ways. Classification personnel look at career field<br />

structuring, to validate the present structure or recommend restructuring.<br />

USAFOMS psychologists rely heavily on the data to establish the content<br />

validity of enlisted promotion tests. USAFOMS training analysts also use the<br />

data for systems analyses, task analyses, and assessment of education and<br />

training requirements. But, perhaps the most visible use of the OSR data to<br />

date is in determining training requirements. In today’s environment, where<br />

the training dollar is tight, training must be geared only to what the person<br />

will need to do the job effectively. In this regard, the emphasis today is<br />

placed on determining how job incumbents will be used in the first job assignment,<br />

identifying those tasks for which the probability of performance by<br />

airmen in their first assignment is high, and providing initial training on<br />

these tasks. OSR data are the key to designing initial courses that train<br />

550


only for the first job, as well as providing valuable information for what to<br />

include in follow-on training.<br />

The Traininq Development Services Proaram<br />

The Training Development Services Program was established in 1982 to improve<br />

Air Force training by using a systematic approach to training development.<br />

The program goal is to enable customers to provide "Quality Training for a<br />

Quality Force.” Training analysts are located at Randolph AFB and at each of<br />

the six technical training centers. Their primary function is to provide<br />

front-end task and training analysis to support Air Force instructional<br />

system development (ISD) requirements. The analysis focuses mainly on the<br />

second step of ISD, "Define Education and Training Requirements.'* The end<br />

result is an analysis of the training requirements of an Air Force specialty<br />

and a plan for structuringland integrating all training within that special-<br />

ty.<br />

The primary product of the Training Development Services Program is the<br />

Training Requirements Analysis (TRA). This document consists of three sec-<br />

tions:<br />

1) Systems Overview. This section provides the user with background<br />

information on the specialty with special emphasis on training needs and<br />

issues. This section lists all training presently available and points out<br />

anticipated changes within the career field such as the acquisition of new<br />

equipment. Data for this section comes from the Air Force <strong>Military</strong> Personnel<br />

Center, functional managers, training managers, and other staff-level organizations.<br />

2) Comprehensive Task Analysis. Each important task of the specialty is<br />

broken down into the skills and knowledge required to do the task. Also<br />

included are the tools, equipment, references, conditions, and performance<br />

standards for each task. This information is obtained through extensive<br />

one-on-one interviews with individuals who are fully qualified in the specialty.<br />

3) General and Specific Training Recommendations. The general recommen-<br />

dations relate to broad training issues such as the development of a new<br />

course or the merger of two or more specialties. On the other hand, specific<br />

recommendations are given task by task and describe where and when a task<br />

should be trained based on field data, task analysis data, and occupational<br />

survey data. The “where” is typically either at a technical training center<br />

or through on-the-job training. The “when” indicates whether a task should<br />

be taught during entry-level training or at a later time in a person’s ca-<br />

reer. Specific training recommendations are often produced in the form of a<br />

proposed Specialty Training Standard; however, the format is varied to meet<br />

‘the needs of the user.<br />

A TRA begins with a request from a specialty representative, usually at the<br />

Air Staff or MAJCOM level. TRAs may be requested in conjunction with an<br />

occupational survey (a product of the Occupational Analysis Program) or as a<br />

follow-on to a previous survey in order to address specific training issues<br />

and concerns. Approved TRAs are listed in the USAF Program Technical Training<br />

document . After approval, a team of training analysts is assigned the<br />

project. Usually analysts from more than one location will work on a<br />

551


project in order to reduce travel costs. Data gathering involves extensive<br />

interviews and observation of skilled specialty technicians. Analysts will<br />

draw from the experience of specialty instructors at technical training<br />

centers and will also travel to various MAJCOMs and bases that employ personnel<br />

in the specialty under study. Specific locations are determined through<br />

meetings with functional managers and will include enough bases to ensure a<br />

thorough sampling. Travel is confined to the continental United States<br />

unless unique specialty units are located overseas. The detailed task infor-<br />

mation gathered during these trips is collected with laptop computers and<br />

entered into automated files. The data then become the basis for the train-<br />

ing requirements summarized in the TRA.<br />

Analysts track the results of each TRA through an extensive external evalua-<br />

tion program that routinely surveys product recipients. User information<br />

received over the past two years indicates TRAs are extremely useful and<br />

serve many purposes. Enlisted personnel account for 80% of the users which<br />

is understandable since most analyses are conducted on enlisted specialties.<br />

Civilians account for 15% while officers account for the remaining 5%. TRAs<br />

are typically used to develop or revise OJT program.s, produce specialty<br />

training standards and criterion objectives, justify the procurement of<br />

training resources, and standardize training programs. TRAs have also been<br />

used to support career field mergers, determine cross utilization of training<br />

programs, and to support Utilization and Training Workshops (U&TWs).<br />

Conclusion<br />

USAFOMS programs impact on virtually every aspect of today’s Air Force:’<br />

Determining service entry criteria, setting aptitude requirements for occupational<br />

'specialties, establishing criteria for job-specific training programs,<br />

and providing the foundation for a fair and objective promotion system. The<br />

future holds new challenges as well, including the possibility of an on-line<br />

occupational information system which permits analysis across weapon systems,<br />

training information which supports both large- and small-scale programs<br />

which are job-specific and cost-effective, and continued improvement of the<br />

promotion system. Key to all future developments is the recognition that<br />

success has come from the operational application of research in industrial<br />

psychology and improvements must follow a similar track: research and validation<br />

prior to implementation.<br />

552


The Examirler is a sopilisticated computer-based system used irl the tlevelopn~et~t 0I’ t)(llll jJ:klJcr ilild IJellril<br />

and coniputer-delivered examinations. Over 200 imtallatiorls world-wide make LX ol tile system ill<br />

applicatiom rmgiiig from traditional classroom tests to the evaluation of sailors iii subni;lrilles al scii.<br />

Tliis paper will give a brief /ljs[ory of The Exmlincr, describe ttle SlrlKlU’e of IlIe sysIt’r~, a~~rl giw so111e<br />

sugestetl irllplenleri(atit,ra.<br />

‘The Evolulio~~ of The Examiner<br />

A number Of expensive ami complex mairirrame testing syslerns existed ii1 1984 ~Vllell III. SI:Uilt‘y Trolli[) ;Illrl<br />

I tlecidecl to put our development and programming experience to work to create a rl:icrocolllputcr IGLWYI<br />

testing syslern. \Ve developed a small prototype system i111d sl~~wetl it to a 11unlber ul’ piospc’c:i*+e cu~tou~r~~<br />

in hope of receiving funding to develop it inlo a full-fleclgcd program.<br />

TIumq$ some hlsilless quaintances in Ellglalid, we leat~lletl tllat tile LOII~IOII Shtk Excll~~ge wit.5<br />

or’el hulirlg their centuries-old brokerage system arid were rnot.irlg towards ;I cerril’itxl repl.eseltl;l~i~~ \\ \ICIII<br />

similar IO ullat we Ilave here in the United States. As part ol’ tllis ~II~~~SS 111e Excl~;ulge dccirl?d III~I IIIC~<br />

rvantecl a comprehlsi~,e computer-based teslhg syslem tlereloped lo 111ee1 h+ crrtilic;llioIl rret!dr. i’llr<br />

design criteria tlq specified were:<br />

The system iiad to be secure. Itell \);ulk ellcryptioll a11d tlatal~;r!:tS ~;LGW~I CI ;IC‘CCS\ II;III IO C’IISUI e<br />

lliat the test items did liot “escape”.<br />

The system had to be reliable. Accuracy in testing was important for its ow11 sake. I-her ~egul;itions<br />

in the UK made it essential that there be IIO errors in recording amwers 01 I~~OI 1i11g gr&s 01<br />

examinees.<br />

T!ie system had lo be easy to use. From a clevelop~llelll staiidpt~iiil, IlIt! s~slrlll ~11011l!./ IW LXYil!<br />

operated by clerical staff. From tlie examillee sta~icipc~iilt. COIIlpU[t’~~ ilcopll~le?; rrlml 1101 Iiriil 111~<br />

sol‘twnre impairing their test-lakiiig ability in ally \v;ly.<br />

Dr. Trollip ar~tl I conviricetl the ExchaJlge ~llat we could develop ;I sul’t~u e p~~ll~cl 1’01 lllt2lil 111~11 LCltlltl IllttCl<br />

their ~lcetls, a11d that we could develop it for them on budget rultl in time for rileit “Big I3Ulg” c!et C~Uliili0il ill<br />

tile Fall of 1985. We succeeded, and the software product The Exanlihr ws l’irst ;v;s used iii Octoi,er 01’<br />

1985. Since ~hcn, every stockbroker in tile United Kingdom and Ireland has kc11 CC‘I lil’icd using (1i11 S~SIVIII.<br />

Over Ic)OO tests a year ha\.e been given.<br />

Produce h0lh ctlnlpUlel-deli~e~.~~j alltl lraditional paper ;irld pelril 1esl.r II 0111 ;i hillgle Cl:Ilal~;i\~<br />

Provide a Il~aIllc~o~k tllkiI rilI1 p~ocluce simple spoI quizzes 01 corltplex ~Iu:diliv.ilioll Ic\ls.<br />

Provide item al’ld test l’eetlback allowing the htegratioll ol’ trainin,0 iill illc ~:i~;lill:lIl~l! 111 o~~kb;.<br />

Track item statistics for lhe improvement of item tmlk quality.<br />

Track examince statistics for irlriividual and clans reporting purposes.<br />

553


_____ - . . . ..- -_-.---. --.-. -~----._----_.-.- --~__<br />

Item Editor The item bank is created in this part of the system. Tile structure ol’ 111e rlala~xw is<br />

developed to allow for accurate testing of different subject areas.<br />

Exam Editor Once the item bank has been created, the examination editor is used to create “pr~l’iie~” IIMI<br />

will be trsed a5 telI~~)lilleS to create tesls.<br />

Exam Delivery Tests are delivered in a secure environruetlt with a user iMerf;tce designed to allow tIlti<br />

assessment of examinee knowledge rather than test-taking ability. Paper and pencil tests ale<br />

cleanly printed for maximum clarity and legibility.<br />

Statistics Completed examination records can be viewed for both examinee results and item allalysis.<br />

Tile Ilem Bank Structure<br />

0~ of the great powers of The Exaalincr is that the software naturally leads tl~e rle~loprr inlo ol,giilli/.illg<br />

item into a logically-structured item banks. This ratiomlly structured item bar~k illl(~wS (1~ c~lrlstruclion 01<br />

tests that can evaluate specific learning objectives or bloacl knowledge are;t% Examiner


The Examiner<br />

Multiple Choice<br />

Multiple Correct<br />

Dynamic Multiple<br />

Choice<br />

Dual<br />

Short Alpha Answer<br />

Short Numeric Answer<br />

Linked<br />

Parallel<br />

Up lo 10 aJlern;Ilives are available, each alternative IIaviIIg its O\VII gl.illlillg<br />

weight with item mastery based 011 achieving a set SUIII ol c~rrecI dIr~IIilIi\es.<br />

Up to 10 alternatives are available. Ilenis are coIulrurletl \I~ll~\IlliCilll~ ;II<br />

exaIIIiII;Ilion geIIeralioII lime by selecting a preselected ~turuber ol’ WI reck ;IIKI<br />

incorrect allerIIalive%<br />

A special Corm of multiple choice item created al examination getteration litlIe<br />

from a set of four alternatives, two correct and two incorrect.<br />

Up to ten words can be judged at one lime. Misspellings, extra words, irtcvrrect<br />

word order, and capilalizalion errors can be allowed or tlisallouetl at will.<br />

A floating point number can be judged. Exact matching or plus ;uId mittus a11<br />

absolute number or percentage of error can be allowed.<br />

Up to 99 items can be linked, together ittto a “scenario” type of item. Mastery<br />

of the linked item can be based 011 mastery ol all or [‘ill t ol’ lltr ittclutled items.<br />

Up to 99 items can be grouped together as a parallel item. At examimrtiott<br />

generatioti the, lhe sysleni will iwidomly select 01) tlie px;illel items 101.<br />

inclusion in the examinatiun.<br />

The means of examination delivery will ol’ten determine the type of questions tlmt 211 e IO be used.<br />

If computer-delivered or manually-graded paper and pencil examination ;II r’ LO be used, then any of tl~ese<br />

item types can be used. If machine-graded paper and pencil examinatiotts are anticipated, tlten multiple<br />

choice is the usual choice.<br />

Item Entry<br />

An integrated word-processor allows clerical staff to enter items into The Examiner. A judicious use oI’ preformatting<br />

enables The Examiner editor lo produce correclly enleretl items every time.<br />

If an item bank is already in existence, a Taz hport r/rility can be used to load items ~IIIO att Examiner<br />

database. Mai&ame-based item banks can be successfully migrated into the Exanrirrcr’s ertvirotunertt \r.i~ll in<br />

minimum of expensive “hands-on” intervention.<br />

Examination Develonmenl<br />

Once an item bank has been created, the developer can create examinations from dl OI’ poll 1 ol tl1r 1Wlk.<br />

Item selection can range from the selection ot’ specific ilems lo m~clorn seleclioIi.<br />

The Examiner is unique in its use of projifes. A profile is a set of directives that tells llie Examiner testgenerating<br />

software how to extract items out of the database and create an exarnittaliott. The Examiner<br />

doesn’t store tesfs. Rather, it stores profiles that allow tests to be created otr-demand by accessing 1l1e<br />

profile. This mikes Examiner databases totally self-coIllaiIIed, allowing the crealioII of urIiqtIe, ye1<br />

equivalenl, examinalions at any lime.<br />

Profiles contain two main sets 01 specificatiotu:<br />

555


The Examiner -<br />

Global Specilicatiolls These are sets of parameters that effect things such as the number 01 items fo be<br />

shown in the examination, the pass mark for the examhation, the dilliculty of the<br />

examination, and the way that the examimatioll is to appear lo llle cxaulillre.<br />

selection and random presentation of multiple-choice alternatives. Al the other elltl, the prolile for a<br />

complex certilication examination can be created that will yield a test of a given dilliculty level to test very ,<br />

specific areaS of knowledge.<br />

All Questions in Database<br />

/- I ------s_-----.--- -__- ---. -- . ..-.(<br />

History (t)<br />

Gyu$-v b)<br />

1 .o.o<br />

. .<br />

I!<br />

,~-?--.-Ji<br />

- -<br />

Pdople (t) States (*)<br />

1.1.0 1.2.0<br />

- -_-.::;7:...;2-.=<br />

/I<br />

,-===-:“:..- ;<br />

_ -:<br />

Cities(Z) Seas Isilands(+)<br />

2.1.0 2.2.0 2.3.0<br />

I’<br />

A------ ii<br />

A,/<br />

. . I ._- .-.- _-. _! I<br />

f-------11 I<br />

--====Tr------I T-- 1<br />

Q - 4<br />

.'..<br />

Q Q Q s Q<br />

1.1.1 1.1.2 121 . . 122 . . 2.L- ‘: 2 2.L 2.2.1 2.2.2 2.3.1<br />

The above illustration gives an example of the type of sophisticaled iteril scleclioll crilel ia Illat calI bc us~trl ill<br />

Examiner profiles. In this example, the profile has been designed so that a six item lest will be gellel~i\letl.<br />

The item selection criteria have been set so that:<br />

1)<br />

2)<br />

3)<br />

All examinees will get item 1.1.1.<br />

Two items will be selected from area 2.1.0. In this case, the random selection jMX)ceSS Select4 2.12<br />

and 2.1.3. It could just as well have be&m two items from 2.1.0.<br />

Everyone will get item 2.3.1.<br />

4) The rest ol the test will be completed with items I‘rom 1.2.0. 111 Illis case, items 1.2.1 arid 12.2 \vere<br />

selected.<br />

in addition to specifying item selection by objective clzsilication, protiles WI S[JeCify Se~ecIiorl IIy tlil’l’icully,<br />

item type, and item characteristic. The Exaltliner will attempt to produce a test lllat w close& r11Nci1e~ III~<br />

requested characteristics as possible.<br />

Examination Delivery<br />

‘l‘hC ikmirler is unique in its ability to produce both cool[)uter-cleliver~~l a11tl p;ll)t’t ;IIKI pencil c.~;\llliil;llioll.s<br />

from the same item bank. The developer is given the ability to select the delivery 1110tle 11~1 is tl~ 111051<br />

appropriate for their testing needs.<br />

556


Paner and Pencil <strong>Testing</strong><br />

The Examiner can easily produce multiple forms of paper and pencil tests. By the illtroduction of r~u~rlom<br />

selection options into the profile used to generate the test, a user cm produce two statistically ecluivalerrt<br />

tests on the same subject area. If desired, a unique test could be generated for each examirree. Pil1)e.r ant1<br />

pencil tests can be scored in three Lshions:<br />

Manually Student and instructor’s answer keys can be printed wit11 eaclt test. In manly<br />

instances, hand grading using these forrrls is acceptable.<br />

Stand-Alone Machine<br />

Graded<br />

A “pre-slugged” answer sheet compatible with tile sland-able Scalilraii 888<br />

optical scalner can be produced nn an IIP Lx?er.let II printer. This allows the<br />

easy scoring of mulliple examinations on readily available hardware.<br />

Data-Terminal Scanning ~11 allswer hey is stored internally in The Examiner’s database ant1 can be used<br />

to grade examinee answer sheets. Complete examiriation and item stalislics 111 e<br />

stored when data terminal scanners are used. At p~se~ll, he SC~IIUOII<br />

1300/1400 series scanners are supported. In 1st Quarter 1391, support (01 tile<br />

Scantron 8000/8200 series scanners will be added.<br />

Print options of The Examiner ae currently being elkulceti, and these new featur-es will be r~lc;t.setl irl tl~e<br />

first quarter of 1991. Supported features will be:<br />

PriNers hitid SUppOrt IOf 20 Ol‘dle IllOSt COr1111~01~ pI’illkrs. on cu.slolllel‘ I’qllesl, Ilk! 1” illlcr~<br />

support iiies will be expanded to include atlriilional prillters.<br />

Fonfs Depending OII the printer, font control will be added. With lile l-1 1’ I-;iserJel prirltt?,<br />

numerous font cahdges will be supported in addition to liie dei’auil illlet 11ai I’OII~S.<br />

Highlighting Bold, italic, underlining, superscripts, and subscripts will be available on Ixinlers tl\a~<br />

support liiose features.<br />

Graphics Primed items will include PC-Paintbrusll’” images that are currelitly 01lly available witii<br />

computer-delivered examinations. Pririring will be limited to tllosr pr~irrlers Iilal SUppul’l<br />

graphic printing.<br />

Sr~ggeslcd Implementations<br />

Paper and pencil tests can be made available to examinees under a Ilumber of tliI’l’ereIlI delivery ellvirollmerits.<br />

Tile “unbundlirlg” of p,arrs of Tire Examiner makes it possible 10 Ilave ~lol~-lecl~llical clcric~~l s~l’l<br />

produce on-demand tests at remote sites. Three possible test creation/delivery sceri;irios WC:<br />

Local Control Tests are created using the main Examiner system and are gritlieti at the rlevel~pme~~t sile.<br />

Copies of the item bank remain secure in one place and test creation WWOI is tiglrlly<br />

limited. Tests cal be mailed to remote sites and tilell returned to tile cetl~ral sile ~OI<br />

grading.<br />

Networked Using the network versioll of the stand-alone examillatiorl generalor. wniole sires c;in loc;~ll~<br />

generate tests and score Illem. Witllout access to (lie eclitillg 13io~;~‘~ll~Ih, IlIe s;ccclriry of’ IlIt!<br />

database is maintained wicle provicliq tile collvenience ol’ simuil:uleous multiple xwss to<br />

the items.<br />

557


.<br />

Remote Copies of rhe database are distributed to the remte sites, 3lld the St~lIld-~l0lle ex~IlJliIliili~Jll<br />

generator is used to produce the tests. Grading is done locally.<br />

Of course, numerous variations on these biUiC Iliemes are possible lo ol’~ef~ a clclik,ery cII~ir.oIIItIrIIt<br />

appropriate for the unique delivery requirements.<br />

Comouter-Delivered Testinp<br />

Examinations can be delivered via computer using The Examiner’s administration software. Sa~nplc<br />

examinations allow the examinee lo become fanlihr With the testing software so thal the at~IlliIlistralioIl<br />

system tests knowledge rather than computer test-taking ability.<br />

Basic options available within the administration system are:<br />

Sequencing Examinees can be required to answer each item before they see Ilie next item, and are IlOt<br />

allowed KJ review and change their items. Or, exanIiIJees calI move willIiII the exaIIJiIIali~~J1<br />

at will, chlgirlg their answers tlrltil complete examination scoriIlg is rquesIcd.<br />

Feedback Real-titlle student mastery feedback cm rarlge from 11one at all to tlrtitiled Itedback ill ille<br />

muItiple-choice alterliarive level. Full-test mastery crileria a~~tl r.esrllls ciitl be aclivaleil wllell<br />

appropriate.<br />

Randomization Item presentation order can be random or lixed. Within items, mulliple choice alterIIaIive<br />

order can be randomized.<br />

Examiner tests are DOS fries and car1 be moved f~‘orn one nlachhe to anoliier by ilIly Ilumber of lllt!thl~c~S<br />

Floppy disks, local area networks, and distributed cornlnullicaliol1 networks are all p


The Exanliwr is ii sopllis[ic;l(etj conlpu~er-l)~t~etl examin;ltion system IIlirl GUI mrcl IIIOSI ltdiilg I\cch (11<br />

bodl large and small organizations. Its ilbilily 10 deliver l.Wlil pilp” illltl pcrlcil iIllCl ~~~llll~~ll~~~l~il.S~ll<br />

examinations from tile sane &i&are give it a unique power in arl areai where lesliilg lltxcls ciul soillclilnri<br />

change with great rapidity. ~v&ing to meet user’s needs, The Examiner is a cosl-efleclive ol’f-tile-sl~zll’<br />

solutiori for the evaluahri of examirlees arld students.<br />

For informntioii corilacl:<br />

Media Compuler Enrerprises, Ltd.<br />

880 Sibley Memorial Highway, Suite 102<br />

Mendola Heights, MN 55118-1708 USA<br />

Phone: 612-451-7360<br />

FAX: 612-451-6563<br />

559


. . ..--.-_---_- .-_<br />

32ND ANNUAL CONFERENCE OF THE MILITARY TESTING ASSOCIATION<br />

ORANGE BEACH, ALABAMA<br />

5-9 NOVEMBER 1990<br />

Minutes of the Steering Committee Meeting<br />

5 November 1990<br />

The meeting of the Steering Committee for the 32nd Annual<br />

Conference of the <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong> was held In the<br />

Sand Castle (1) Room of the Perdido Hilton Hotel, Orange Beach,<br />

Alabama,<br />

MEMBERS AND ATTENDEES. S,ee the List of Steering Committee<br />

Meeting Attendees, which follows these minutes as Attachment 1.<br />

1. The meeting was called to order at 0930 hours by CDR M. R.<br />

Adams, 1990 Chairperson.<br />

2. The financial report reflected a sharp impact from the recent<br />

economic budget problems: last year over 350 people were<br />

registered and for 1990, 300 were expected. Because of the<br />

sizeable funds passed to NETPMSA from the 1989 hosts and the<br />

expected number of attendees, the registration costs were<br />

substantially reduced. There have been many cancellations and<br />

the current estimate of registered attendees Is 140. (As of 9<br />

November, there were 165 registered attendees.)<br />

3. Future conference locations were discussed:<br />

(a) The Federal Republic of Germany received a letter signed<br />

by OASD (FM&P) stating that support for U.S. participation at the<br />

'91 MTA Conference would be given. However, NETPMSA, as the '90<br />

MTA host, recommended review of the next site due to the budget<br />

difficulties encountered and the financial planning problems for<br />

the host. Several discussions followed regarding funding<br />

cutbacks expected and difficulties In promising good attendance<br />

numbers In Germany. Until the testing/research budgets finalize,<br />

it was agreed to defer Germany as the quest host, A Spring<br />

versus Fall time period was discussed also but the qroup decided<br />

to leave the conference as an expected Fall budget Item. even if<br />

attendance went down. The USAF Occupational Measurement Squadron<br />

tentatively agreed to host the 1991 MTA Conference in San<br />

Antonio, Texas, site of the 1989 conference. (That 1991 site was<br />

confirmed on 6 November 1990).<br />

(b) The Navy Personnel Research and Development Center will<br />

host the 1992 MTA Conference in San Diego, California.<br />

560


(c) The Coast Guard will host the 1993 MTA Conference in<br />

Williamsburg, Virginia.<br />

(d) The Federal Republic of Germany will host the 1994 MTA<br />

Conference in Germany in conjunction with Naval Research in<br />

London, England,<br />

(e) Canada will host the 1995 MTA Conference.<br />

4. There was general discussion on the submission of abstracts<br />

and the difficulty In getting them in a timely way. Many members<br />

felt the Steering Committee members should be more forceful in<br />

the association and possibly require a committee member screen on<br />

presentations. This would assist with quality and timeliness.<br />

Some members felt there should be greater recruitment for topics<br />

from the production/development areas since research is already<br />

so well represented.<br />

5. Regarding the Harry H. Greer Award, there was discussion<br />

about an Awards Committee being established, as mentioned in the<br />

charter, to provide more structure and coverage in getting more<br />

good nominations. The general opinion was that the current<br />

method of presenting nominations to the current chairman for<br />

further opinion is sufficient. However, the committee members<br />

all agreed that nominations should be specific In detail<br />

regarding the currency and degree of the nominee's involvement<br />

with the <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong>: professional contributions<br />

in research/production; published material, etc.<br />

6. There was general agreement that the 1989 carry-over topic of<br />

"MTA name change" should be dropped from future MTA Steering<br />

Committee meeting agendas. This has been a repeated item and the<br />

historic continuity value of the current title is most important.<br />

M. R. ADAMS, CDR, USN<br />

1990 Chairperson<br />

561


__-.-_ ___rr_l .__-__ __^__ .__...__.. - .___..____-.__._ _ _-.. . _<br />

Canadian Forces Personnel Applied<br />

Research Unit<br />

National Defence Headquarters<br />

Canadian Forces Directorate of<br />

<strong>Military</strong> Occupational Structures<br />

Federal Ministry of Defense<br />

Federal Republic of Germany<br />

MOD Science 3 (AIR)<br />

Ministry of Defence<br />

United Kingdom<br />

Royal Australian Air Force<br />

Royal Netherlands Army<br />

SEC PSY OND/CRS<br />

Belgian Armed Forces<br />

Naval Education and Training Program<br />

Management Support Activity (NETPMSA)<br />

Naval <strong>Military</strong> Personnel Command<br />

Navy Occupational Development and<br />

Analysis Center (NODAC)<br />

Navy Personnel Research and<br />

Development Center (NPRDC)<br />

Defense Activity for Non-Traditional<br />

Education Support (DANTES)<br />

U.S. Air Force Human Resources Laboratory<br />

U.S. Air Force Occupational Measurement<br />

Squadron<br />

U,S. Army Research Institute (PERI-RG)<br />

U.S. Coast Guard Headquarters (G-PWP-2)<br />

OBSERVERS:<br />

Air Traffic Services Transport Canada<br />

Chief of Naval Operations<br />

1990 MTA STEERING COMMITTEE MEETING ATTENDEES<br />

562<br />

_ .._._ _-. _-. __.. .- __._ ---_-.__ __,. _ _. . _. _<br />

CDR Frederick F.P. Wilson<br />

COL James C. Fleming<br />

Mr. G. J. (Jeffi Higgs<br />

COL Terry J. Prociuk<br />

Martin L. Rauch<br />

(Represented by<br />

LTCOL John Blrkbeck,<br />

MOD A ED 4)<br />

Squadron Leader John S. Price<br />

COL Dr. Ger J.C. Roozendaal<br />

CAPT Francois J.M.E. Lescreve<br />

CDR Mary R. Adams<br />

CAPT Edward L. Naro<br />

Dr. Alain Hunter<br />

Mr. William A. Sands<br />

Roger G. Goldberg<br />

Dr. Lloyd D, Burtch<br />

J. S. Tartell<br />

Dr. Timothy W. Elig<br />

Richard S. Lanterman<br />

J. R. Dick Campbell<br />

Mr. Charles R. Hoshaw<br />

Attachment 1


ORGANIZATION<br />

ROYAL WSTRALIAR AIR FORCE:<br />

Royal Australian Air Force<br />

U.S. Air Force Human Resources<br />

Laboratory (AFHRLIMOD)<br />

Brooks AFB, TX 78235-5000<br />

U S A<br />

A/V 240-3640 COM: (512) 536-3648<br />

BELGIAN ARMED FORCES,<br />

SEC PSY OND/CRS:<br />

SEC PSY OND/CRS<br />

Bruynstraat<br />

B-1120 Brussels<br />

Belgium<br />

2 2680050, Ext. 3279<br />

CAK4DIA.N FORCES DIRECTORATE OF<br />

MILITARY OCCUPATIONAL STRUCTURES:<br />

Canadian Forces Directorate of<br />

<strong>Military</strong> Occupational Structures<br />

National Defence Headquarters<br />

101 Colonel By Drive<br />

Ottawa, Ontario KlA OK2<br />

Canada<br />

MILITARY TESTING ASSOCIATION<br />

STEERING COMMITTEE MEHBERS<br />

AUSTRALIA<br />

BELGIUM<br />

CANADA<br />

563<br />

1990 REPRESENTATIVE<br />

Squadron Leader John S. Price<br />

(Squadron Leader Kerry J. McDonald w<br />

be 1991 representative)<br />

CAPT Francois J.M.E. Lescreve<br />

COL James C. Fleming<br />

(A/V 642-3507 COM: (613) 992-3507)<br />

Mr. G. J, (Jeff) Higgs<br />

(A/V 842-7069 COM: (613) 922-7069)


--,..--c__<br />

____ _____....__.____ - .-._ _.._ _-__--.l-- _--.. -- . ..-- ~. ..-..<br />

.--<br />

--- ---<br />

--~-----<br />

._~ _.... ~- _.._.~ _.<br />

ORGANIZATION 1990 REPRESENTATIVE<br />

DIRECTOR OF PERSONNEL PSYCHOLOGY AND SOCIOLOGY:<br />

Director of Personnel Psychology and<br />

Sociology<br />

National Defense Headquarters<br />

Ottawa, Ontario KlA OK2<br />

Canada<br />

Canadian Forces<br />

A/V 842-0244 COM: (613) 992-0244<br />

CANADIAN FORCES PERSONNEL APPLIED RESEARCH UNIT:<br />

Canadian Forces Personnel Applied<br />

Research Unit<br />

4900 Yonge St., Suite 600<br />

Willowdale, Ontario M2N2Z4<br />

Canada<br />

Canadian Forces<br />

COM: (416) 224-4964<br />

FEDERAL NINISTRY OF DEFENSE:<br />

COL Terry J. Prociuk<br />

CDR Frederick F.P. Wilson<br />

FEDERAL REPUBLIC OF GERMANY<br />

Federal Ministry of Defense, P II 4<br />

Postfach ‘1328<br />

5300 Bonn 1<br />

Federal Republic of Germany<br />

Federal Ministry of Defense<br />

COM: 49-228-128543<br />

FEDERAL REPUBLIC OF GERMANY AIR FORCE:<br />

Federal Republic of Germany Air Force<br />

Wehrbereichsverwaltung II<br />

V-4-Psychology Angelegenheiten<br />

Hans-Blocher-Alee 16 3000 Hannover<br />

05 11-531-26 08126 03<br />

564<br />

Martin L. Rauch<br />

Wolfgang Weber<br />

.-. .


ORGANIZATION 1990 REPRESENTATIVE<br />

ROYAL NETHERLANDS ARMY:<br />

Royal Netherlands Army<br />

DPKL/AFD GW<br />

Postbus 90701<br />

2509 LS The Haque<br />

The Netherlands<br />

COM: 31-71-6135450<br />

MOD SCIENCE 3 (AIR):<br />

MOD Science 3 (AIR)<br />

Lacon House<br />

Theobalds Road<br />

London, WCIX 8RY<br />

England<br />

U.S. AIR FORCE<br />

THE NETHERLANDS<br />

L<br />

THE UNITED KINGDOM<br />

COL Dr. Ger J.C. Roozendaal<br />

Eugene F. Burke<br />

(Represented in 1990 by<br />

COL John Birkbeck<br />

MOD A ED 4<br />

Court Road<br />

Eltham<br />

London SE9 5NR<br />

United Kingdom)<br />

UNITED STATES OF AMERICA<br />

U.S. AIR FORCE HUMAN RESOURCES LABORATORY<br />

(AFHRL):<br />

U.S. Air Force Human Resources Laboratory<br />

(AFHRL/PR)<br />

Brooks AFB, TX 78235-5601<br />

USA<br />

(A/V 240-3011 COM: (512) 536-3611)<br />

U.S. Air Force Human Resources Laboratory<br />

(AFHRLICC)<br />

Brooks AFB, TX 78235-5000<br />

USA<br />

565<br />

Dr. Lloyd D. Burtch<br />

COL Harold G. Jensen


-.e-..- -_-._.- -I-..-_ __..,......_.....<br />

..-. ---- ---<br />

ORGANIZATION 1990 REPRESENTATIVE<br />

U.S. AIR FORCE OCCUPATIONAL MEASUREMENT SQUADRON<br />

(OMY):<br />

U.S. Air Force Occupational<br />

Measurement Squadron (OMY)<br />

Randolph AFB, TX 78150-5000<br />

USA<br />

DSN 487-6623 COM: (512) 652-6623<br />

U.S. ARMY RESEARCH INSTITUTE (PERI-RG):<br />

U.S. Army Research Institute (PERI-RG)<br />

5001 Eisenhower Avenue<br />

Alexandria, VA 22333-5600<br />

USA<br />

A/V 354-5786 COM: (703) 274-5610<br />

U.S. COAST GUARD<br />

U.S. COAST GUARD HEADQUARTERS:<br />

U.S. Coast Guard Headquarters<br />

Chief, Occupational Standards<br />

(G-PWP-21, Room 4111<br />

2100 Second St., S.W.<br />

Washington, DC 20593-0001<br />

IJSA<br />

COM: (202) 267-2986<br />

U,S, NAVY<br />

NAVAL EDUCATION AND TRAINING PROGRAM<br />

MANAGENENT SUPPORT ACTIVITY (NETPHSA):<br />

Naval Education and Training Program<br />

Management Support Activity (NETPMSA)<br />

(Code 03)<br />

Pensacola, FL 32509-5000<br />

USA<br />

A/V 922-1685 COM: (904) 452-1685<br />

.._r- -.._-“_.----~~~~r.-._ .c--.<br />

566<br />

J S. Tartell<br />

Dr. Timothy W. Ellq<br />

Richard S. Lanterman<br />

CDR Mary Adams<br />

Dr. James M. Lent2<br />

. _.__.. _ .~ --


ORGANIZATION<br />

NAVAL HILITARY PERSONNEL CONHAND NAVY OCCUPATIONAL<br />

DEVELOPHENT AND ANALYSIS CENTER (NODAC):<br />

Naval <strong>Military</strong> Personnel Command<br />

Navy Occupational Development and<br />

Analysis Center (NODAC)<br />

Bldg. 150, Wny, (Anacostia)<br />

Washington, DC 20374-1501<br />

USA<br />

NAVY PERSONNEL RESEARCH AND DEVELOPNENT CENTER<br />

(NPRDC):<br />

Navy Personnel Research and<br />

Development Center (NPRDC)<br />

<strong>Testing</strong> Systems Department (Code 13)<br />

San Diego, CA 92152-6800<br />

USA<br />

A/V 553-9266 COM: (619) 553-9266<br />

DEFENSE ACTIVITY FOR NON-TRADITIONAL EDUCATION<br />

SUPPORT (DAMTES):<br />

Defense Activity for Non-Traditional<br />

Education Support (DAMTES)<br />

Pensacola, FL 32509-7400<br />

USA<br />

A/V 922-106411745 COM: 904-452-1063<br />

OFFICE OF ASSISTANT SECRETARY OF DEFENSE FORCE<br />

MANAGEMENT AND PERSONNEL (FN&P):<br />

Office of Assistant Secretary of Defense<br />

Force Management and Personnel (FM&P)<br />

Washington, DC 20301<br />

USA<br />

A/V 227-4166 COM: (202) 697-4166<br />

567<br />

-_<br />

1990 REPRESENTATIVE<br />

CAPT Edward L. Naro<br />

(A/V 288-5488 COM: (202) 433-5488)<br />

Dr. Alain Hunter<br />

(A/V 288-4620 COM: (202) 433-4620)<br />

Mr. William A. Sands<br />

Roger G. Goldberg<br />

Dr. W. S. Sellman


_-----_<br />

.- ___I_._-____ .I_..__.__ - -.-. I . ..-..-.. -.<br />

HY-LAH’SOFTHE RlII.IT.4KY TklS’l’INC ASSOCIATION<br />

Article I - Name<br />

The name of this organization shall be the <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong>.<br />

Article Ii - Purpose<br />

The Purpose of this <strong>Association</strong> shall be to:<br />

A. Assemble representatives of the various armed services of the United States<br />

and such other nations as might request to discuss and exchange ideas concerning<br />

assessment of military personnel.<br />

B. Review, study, and discuss the mission, organization, operations. and research<br />

activities of the various associated organizations engaged in military personnel<br />

assessment.<br />

C. Foster improved personnel assessment through exploration and presentation<br />

of new techniques and procedures for behavioral measurement, occupational<br />

analysis, manpower analysis, simulation modeIs, training programs, selection<br />

methodology, survey and feedback systems.<br />

D. Promote cooperation in the exchange of assessment procedures, techniques<br />

and instruments.<br />

E. Promote the assessment of military personnel as a scientific adjunct to<br />

modern military personnel management within the military and professional<br />

community.<br />

Article I1 I - Participation<br />

The following categories shall constitute membership within the MTA:<br />

A. Primary Membership.<br />

1. All active duty military and civilian personnel permanently assigned to an<br />

agency of the associated armed services having primary responsibility for assessment<br />

for personnel systems.<br />

2. All civilian and active duty military personnel permanently assigned to an<br />

organization exercising direct command over an agency of the associated armed<br />

services holding primary responsibility for assessment ofmilitary personnel.<br />

B. Associate Membership.<br />

1. Membership in this category will be extended to permanent personnel of<br />

governmental organizations engaged in activities that parallel those of the primary<br />

membership. Associate members shall be entitled to all privileges of primary<br />

members with the exception of membership on the Steering Committee. This<br />

restriction may be waived by the majority vote of the Steering Cornmi ttee.<br />

568


C. Non-Member Participants.<br />

1. Non-members may participate in the annual conference. present papers<br />

and participate in symposium ‘panel sessions. Xon-members will not attend the<br />

meeting of the Steerrng Committee nor have a vote in association affairs.<br />

Article IV - Ilues<br />

No annual dues shall be levied against the participants.<br />

Article V - Steering Committee<br />

A. The governing body of the <strong>Association</strong> shall be the Steering Committee. The<br />

Steering Committee shall consist of voting and non-voting members. I’oting<br />

members are primary members of the Steering Committee. Primary membership<br />

shall include:<br />

1. The Commanding Officers of the respective agencies of the armed services<br />

exercising responsibility for personnel assessment programs.<br />

2. The ranking civilian professional employees of the respective agencies of<br />

the armed service exercising primary responsibility for the conduct of personnel<br />

assessment syst.ems.<br />

3. Each agency shall have no more than two (2) representatives.<br />

B. Associate membership of the Steering Committee shall be extended by<br />

majority vote of the committee to representatives of various governmental organizations<br />

whose purposes parallel those of the <strong>Association</strong>.<br />

C. The Chairman of the Steering Committee shall be appointed by the President<br />

of the <strong>Association</strong>. The term of office shall be one year and shall begin the last day of<br />

the annual conference.<br />

D. The Steering Committee shall have general supervision over the affairs of the<br />

<strong>Association</strong> and shall have the responsibility for all activities of the <strong>Association</strong>.<br />

The Steering Committee shall conduct the business of the <strong>Association</strong> in the interim<br />

between annual conferences of the <strong>Association</strong> by such means of communication as<br />

deemed appropriate by the President or Chairman.<br />

E. Meeting of the Steering Committee shall be held during the annual<br />

conferences of the <strong>Association</strong> and at such times as requested by the President of the<br />

<strong>Association</strong> or the Chairman of the Steering Committee. Representation from the<br />

majority of the organizations of the Steering Committee shall constitute a quorum.<br />

Article VI - Officers<br />

1 consist of a President, Chairman of the<br />

A. The officers of the <strong>Association</strong> shal<br />

Steering Committee and a Secretary.<br />

569


B. The President of the <strong>Association</strong> shal1 be the Commanding Officer of the<br />

armed services agency coordinating the annual conference of the <strong>Association</strong>. The<br />

term of the President shall begin at the close of the annual conference of the<br />

<strong>Association</strong> and shall expire at the close of the next annual conference.<br />

C. It shall be the duty of the President to organize and coordinate the annual<br />

conference of the <strong>Association</strong> held during his term of office, and to perform the<br />

customary duties of a president.<br />

D. The Secretary of the <strong>Association</strong> shall be filled through appointment by the<br />

President of the <strong>Association</strong>. The term of office of the Secretary shall be the same as<br />

that of the President.<br />

E. It shall be the duty of the Secretary of the <strong>Association</strong> to keep the records of<br />

the association, and the Steering Committee, and to conduct official correspondence<br />

of the association, and to insure notices for conferences. The Secretary shall solicit<br />

nominations for the Harry Greer award prior to the annual conference. The<br />

Secretarv shall also perform such additional duties and take such additional<br />

responsibilities as the President may delegate to him.<br />

Article 1’11 - Rleetings<br />

A. The <strong>Association</strong> shall hold a conference annually.<br />

B. The annual conference of the <strong>Association</strong> shall be coordinated by the agencies<br />

of the associated armed services exercising primary responsibility for military<br />

personnel assessment. The coordinating agencies and the order of rotation will be<br />

determined annually by the Steering Committee. The coordinating agencies for at<br />

least the following three years will be announced at the annual meeting.<br />

C. The annual conference of the <strong>Association</strong> shall be held at a time and place<br />

determined by the coordinating agency. The membership of the <strong>Association</strong> shall be<br />

informed at the annual conference of the place at which the following annual<br />

conference will be held. The coordinating agency shall inform the Steering<br />

Committee of the time of the annual conference not less than six (6) months prior to<br />

the conference.<br />

D. The coordinating agency shall exercise planning and supervision over the<br />

program of the annual conference. Final selection of program content shall be the<br />

responsibility of the coordinating organization.<br />

E. Any other organization desiring to coordinate the conference may submit a<br />

formal request to the Chairman of the Steering Committee, no later than 18 months<br />

prior to the date they wish to serve as host.<br />

Article VIII - Committee<br />

A. Standing committees may be named from time to time, as required, by vote of<br />

the Steering Committee. The chairman of each standing committee shall be<br />

appointed by the Chairman of the Steering Committee. Members of standing<br />

committees shall be appointed by the Chairman of the Steering Committee in<br />

consultation with the Chairman of the committee in question. Chairmen and<br />

570


committee members shall serve in their appointed capacities at the discretion of the<br />

Chairman of the Steering Committee. The Chairman of the Steering Committee<br />

shall be ex officio member of all standing committees.<br />

B. The President, with the counsel and approval of the. Steering Committee, may<br />

appoint such ad hoc committees asare needed from time to time. An ad hoc<br />

committee shall serve until its assigned task is completed or for the length of time<br />

specified by the President in consultation with the Steering Committee.<br />

C. All standing committees shall clear their general plans of action and new<br />

policies through the Steering Committee, and no committee or committee chairman<br />

shall enter into relationships or activities with persons or groups outside of the<br />

<strong>Association</strong> that extend beyond the approved general plan of work without the<br />

specific authorization of the Steering Committee.<br />

D. In the interest of continuity, if any officer or member has any duty elected or<br />

appointed placed on him, and is unable to perform the designated duty, he should<br />

decline and notify at once the officers of the <strong>Association</strong> that he cannot accept or<br />

continue said duty.<br />

Article 1X - Amendments<br />

A. Amendments of these By-Laws may be made at any annual conference of the<br />

<strong>Association</strong>.<br />

B. Amendments of the By-Laws may be made by majority vote of the assembled<br />

membership of the <strong>Association</strong> provided that the proposed amendments shall have<br />

been approved by a majority vote of the Steering Committee.<br />

C. Proposed amendments not approved by a majority vote of the Steering<br />

Committee shall require a two-thirds vote of the assembled membership of the<br />

<strong>Association</strong>.<br />

Article X - Voting<br />

All members in attendance shall be voting members.<br />

A. Selection Procedures:<br />

Article XI - Harry H. Greer Award<br />

1. Recipients of the Harry H. Greer Award will be selected by a committee<br />

drawn from the agencies represented on the MTA Steering Committee. The CO of<br />

each agency will designate one person from that agency to serve on the Awards<br />

Committee. Each committee member will have attended at least three previous<br />

MTA meetings. The member from the coordinating agency will serve as chairman of<br />

the committee.<br />

2. Nominations for the award in a given year will be submitted in writing tc\<br />

the Awards Committee Chairman by 1 July of that year.<br />

571


3. The Chairman of the committee is responsible for canvassing the other<br />

committee members to arrive at consensus on the selection of a recipient of the<br />

award.<br />

4. No more than one person is to receive the award each year, but the award<br />

need not be made each year. The Awards Committee may decide not to select a<br />

recipient in any given year.<br />

5. The annual selection of the person to receive the award, or the decision not<br />

to make an award that year, is to be made at least six weeks prior to the date of the<br />

annual MTA Conference.<br />

B. Selection Criteria:<br />

The recipients of the Harry H. Greer Award are to be selected on the basis of<br />

outstanding work contributing significantly to the MTA.<br />

C. The Award:<br />

The Harry H. Greer Award is to be a certificate normally presented to the<br />

recipient during the Annual MTA Conference. The awards committee is responsible<br />

for preparing the text of the certificate. The coordinating agency is responsible for<br />

printing and awarding the certificate.<br />

Article XII - Enactment<br />

These By-Laws shall be in force immediately upon acceptance by a majority of the<br />

assembled membership of the <strong>Association</strong> andor amended (in force 5 November<br />

1990).<br />

572


CDR Mary Adams<br />

Naval Education and Training Program<br />

Management Support Activity (Code 03)<br />

Pensacola, FL 32509-5000<br />

USA<br />

A/V 922-1685 COM: (904) 452-1685<br />

Walter G. Albert<br />

Air Force Human Resources Laboratory/MOD<br />

Brooks AFB, TX 78235-5601<br />

USA<br />

COM: (512) 240-3677<br />

;<br />

Dr. Cathie E. Alderks<br />

Army Research Institute<br />

5001 Eisenhower Avenue<br />

Alexandria, VA 22333-5600<br />

USA<br />

A/V 284-8293 COM: (703) 274-8293<br />

LTCOL Drs Pleter S. Andriesse<br />

Ministry of Defence<br />

Directorate of RNLAF/Personnel<br />

P.O. Box 20703<br />

2500 ES the Dhague<br />

The Netherlands<br />

Jane M. Arabian, Ph.D.<br />

Commander, U.S. Army Research Institute<br />

Attn: PERI-RR)<br />

Alexandria, VA 22003-5600<br />

USA<br />

A/V 284-8275 COM: (703) 274-8275<br />

Klaus Arndt<br />

German Federal Armed Forces Admin Office<br />

Bonner Talweg 177<br />

D-5300 Bonn<br />

Federal Republic of Germany<br />

PH: (Germany) 228-122076<br />

MAJ Robert L. Ashworth, Jr,<br />

U.S. Army Research Institute<br />

Boise Element<br />

1910 University Drive<br />

Boise, ID 83725-1140<br />

USA<br />

COM: (208) 334-9390<br />

LIST OF<br />

CONFERENCE REGISTRANTS<br />

573<br />

Annette G. Baisden<br />

Naval Aerospace Medical Inst. (Code 4<br />

Naval Air Station<br />

Pensacola, FL 32508-5600'<br />

USA<br />

A/V 922-2516 COM: (904) 452-2516<br />

Louis E. Banderet, Ph.D.<br />

U.S. Army Institute of Environmental<br />

Medicine<br />

Health and Performance Division<br />

Natick, MA 01760-5007<br />

USA<br />

A/V 256-4858 COM: (508) 651-4858<br />

Dr. David W. Bessemer<br />

US Army Research Institute<br />

Field Unit - Ft. Knox<br />

Ft. Knox, KY 40121-5620<br />

USA<br />

A/V 464-4932 COM: (5@2) 624-4932<br />

LTCOL John Birkbeck<br />

MOD A Ed 4<br />

Court Road<br />

Eltham<br />

London SE9 5HR<br />

United Kingdom<br />

Dr, Walter C. Borman<br />

Department of Psychology<br />

University of South Florida<br />

Tampa, FL 33620-8200<br />

USA<br />

Michael J. Bosshardt<br />

Personnel Decisions Research Institute<br />

43 Main St., S.E.<br />

Suite #SO5<br />

Minneapolis, MN 55414<br />

USA<br />

COM: (612) 331-3680<br />

CAPT J. Peter Bradley<br />

Canadian Forces<br />

Personnel Applied Research Unit<br />

4900 Yonge St., Suite 600<br />

Willowdale, Ontario, M2N 697<br />

Canada<br />

A/V 827-4239 CCM: i416) 224-4972<br />

12)


Dr. Elizabeth J. Brady<br />

U.S. Army Research Institute<br />

Attn: PERI-RS, 5001 Eisenhower Avenue<br />

Alexandria, VA 22333-5600<br />

USA<br />

A/V 284-0215 COM: (703) 274-8275<br />

David E. Brown, Jr.<br />

Metrica, Inc.<br />

8301 Broadway, Suite 215<br />

San Antonio, TX 78209<br />

USA<br />

LTCOL David E. Brown, Sr.<br />

AFHRLIMOM<br />

Brooks AFB, TX 78235-5601<br />

USA<br />

A/V 240-3942 COM: (512) 536-3942<br />

Lawrence S. Buck<br />

Planning Research Corporation<br />

1440 Air Rail Avenue<br />

Virginia Beach, VA 23455<br />

USA<br />

COM: (804) 460-2276<br />

Dr, Lloyd D. Burtch<br />

Air Force Human Resources LaboratorylPR<br />

Brooks AFB, TX 78235-5601<br />

USA<br />

A/V 240-3011 COM: (512) 536-3611<br />

Dr. Henry H. Busciglio<br />

U.S. Army Research Institute<br />

Attn: PERI-RS, 5001 Eisenhower Avenue<br />

Alexandria, VA 22333-5600<br />

USA<br />

A/V 284-8275 COM: (703) 274-8275<br />

Charlotte H. Campbell<br />

Human Resources Research Organization<br />

295 W. Lincoln Trail Boulevard<br />

Radcliff, KY 40160<br />

USA<br />

J. R. Dick Campbell<br />

Air Traffic Services Transport Canada<br />

1574 Champneuf Dr.<br />

Orleans, Ontario KlC 6B5<br />

Canada<br />

COM: W(613) 998-6617 H(613) 837-0440<br />

574<br />

Roy C. Campbell<br />

Human Resources Research Organization<br />

295 W. Lincoln Trail Boulevard<br />

Radcliff, KY 40160-2042<br />

USA<br />

CAPT William J. Carle<br />

6435 Crestway Dr., #174<br />

San Antonio, TX 78239<br />

USA<br />

A/V 487-3694 COM: (512) 652-3694<br />

Norman A. Champagne<br />

Naval Education and Training Program<br />

Management Support Activity (Code 03172)<br />

Pensacola, FL 32509-5000<br />

USA<br />

A/V 922-1355 COM: (904) 452-1355<br />

Dr. Herbert J. (Jim! Clark<br />

3410 Prince George<br />

San Antonio, TX 78233<br />

USA<br />

A/V 240-3169 COM: (512) 536-3611<br />

Harry A. Clark III<br />

8265 Campobello<br />

San Antonio, TX 78218<br />

USA<br />

A/V 487-5234 COM: (512) 652-5234<br />

Dennis D. Collins<br />

HQDA, DAPE-MR, Rm. 2C733<br />

The Pentagon<br />

Washington, DC 20310-0300<br />

USA<br />

A/V 225-9213 COM: (202) 695-9213<br />

Dr. Harry B. Conner<br />

Navy Research and Development Center<br />

Code 142<br />

San Diego, CA 92123-6800<br />

USA<br />

A/V 553-6675 COM: (619) 553-6675<br />

MAJ Anthony J. Cotton<br />

1 Psych Research Unit<br />

P.O. Box E33<br />

Queen Victoria Tee<br />

Barton Act 2600<br />

Australia


Jack R. Dempsey<br />

Human Resources Research Organization<br />

1100 South Washington Street<br />

Alexandria, VA 22314<br />

USA<br />

COM: (703) 549-3611<br />

Dr. Grover E. Diehl<br />

Eval. & Research Branch<br />

USAF Extension Course Inst.<br />

Gunter AFB, AL 36118-5643<br />

USA<br />

A/V 446-3641 COM: (205) 279-3641<br />

CAPT Joseph M. Donnelly<br />

46 Walcheren Loop<br />

Borden, Ontario LOM ICO<br />

Canada<br />

A/V 270-3917 COM: (705) 423-3917<br />

David A. DuBois<br />

Personnel Decisions Research Institute<br />

43 Main St., S.E,<br />

Suite #405, Riverplace<br />

Minneapolis, MN 55414<br />

USA<br />

COM: (612) 331-3680<br />

MAJ R. Eric Duncan<br />

10109 Trapper's Ridge<br />

Converse, TX 78109<br />

USA<br />

Dale R. Eckard<br />

Naval Education and Training Program<br />

Management Support Activity (Code 3163)<br />

Pensacola, FL 32509-5000<br />

USA<br />

A/V 922-1792 COM: (904) 452-1792<br />

Jack E, Edwards<br />

Navy Personnel R&D Center (Code 121)<br />

San Diego, CA 92111-6800<br />

USA<br />

A/V 553-7630 COM: (619) 553-7630<br />

. - - ~--. --- ----<br />

Dr. Timothy W. Elis<br />

U.S. Army Research-Institute (PERI-RG)<br />

5001 Eisenhower Avenue<br />

Alexandria, VA 22333-5600<br />

USA<br />

A/V 354-5786 COM: (703) 274-5610<br />

MAJ Philip J. Exner<br />

Manpower Analysis, Eva1 & Coordination<br />

Headquarters, U.S. Marine Corps<br />

Washington, DC 20380-0001<br />

USA<br />

A/V 224-4165 COM: (703) 614-4165<br />

Frank Fehler<br />

Flugplatz<br />

3062 Bueckeburg<br />

Federal Republic of Germany<br />

PH: (Germany) 05722-4001, Ext 307<br />

Dr. Daniel B.. Felker<br />

3333 K Street, NW<br />

Washington, DC 20007<br />

USA<br />

COM: (202) 342-5000<br />

Dr. Fred E. Fiedler<br />

Department of Psychology<br />

University of Washington<br />

Seattle, WA 98195<br />

USA<br />

A/V 88-473-2032 COM: (512)671-2032<br />

Dorothy L. Finley<br />

Army Research Institute<br />

Bldg. 41203 Attn: PERI-IG (Finley)<br />

Ft. Gordon, GA 30905-5230<br />

USA<br />

A/V 780-5523 COM: (404) 791-5523<br />

CAPT David C. Fischer<br />

HQ AFOTEC/OAH2<br />

Kirtland AFB, NM 87117-7001<br />

USA<br />

A/V 244-4201 COM: (505) 846-4201<br />

Dr. Jqhn C. Eggenberger Dr. Max H. Flach<br />

Director, Pers Applied Research & Tng Bundesminlster der Verteidigung<br />

SNC Defence Products Ltd. -FuSI8-<br />

Heritage Place, 155 Queens Street, #132 Postfach 1328<br />

Ottawa, Ontario KlP 6Ll D- 5300 Bonn 1<br />

Canada Federal Republic of Germany<br />

COM: (613) 238-7216<br />

575


COL James C. Fleming<br />

National Defence Headquarters<br />

101 Colonel By Drive<br />

Ottawa, Ontario KIA OK2<br />

Canada<br />

A/V 642-3507 COM: (613) 992-3507<br />

Mr. John W,K. Fug111<br />

49 Dalton Rd.<br />

St. Ives, NSW 2075<br />

Australia<br />

(02) 4009243<br />

LTCOL Frank C. Gentner<br />

ASDIALHA<br />

MPT Analysis & Info System Division<br />

Wright-Patterson AFB, OH 45431<br />

USA<br />

Alice Gerb<br />

Director, <strong>Military</strong> Programs Office<br />

Educational <strong>Testing</strong> Service<br />

Rosedale Road<br />

Princeton, NJ 08541<br />

USA<br />

COM: (609) 921-9600<br />

Constance A. Glllan<br />

550 West Pennsylvania Avenue lt8<br />

San Diego, CA 92103<br />

USA<br />

A/V 735-7195 COM: (619) 545-7195<br />

Chrlsta A. Grier<br />

Naval Education and Training Program<br />

Management Support Activity (Code 3124)<br />

Pensacola, FL 32509-5000<br />

USA<br />

A/V 922-1765 COM: (904) 452-1765<br />

Wulf Gronwald<br />

D23<br />

Infanteriestr. 17<br />

8000 Munchen 40<br />

Federal Republic of Germany<br />

(089) 3069-2417<br />

2LT Jody A. Guthals<br />

Air Force Human Resources Lab (MOD)<br />

MPT Technology Branch<br />

Brooks APB, TX 78235-5601<br />

USA<br />

A/V 240-3677 COM: (512) 536-3677<br />

576<br />

Dr. Michael W. Habon<br />

Post Fach 1420<br />

Dornier GmbH, Dept. E&WI<br />

D-7990 Friedrichshafen<br />

Federal Republic of Germany<br />

MAJ Martin P. Hankes-Drielsma<br />

National Defense Headquarters<br />

Ottawa, Ontario KlS 3A8<br />

Canada<br />

Dr. Dieter H.D. Hansen<br />

MOD (Armed Forces Staff)<br />

Postbox 1328<br />

D-5300 Bonn 1<br />

Federal Republic of Germany<br />

Telefax (0228) 12 9059<br />

Mary Ann Hanson<br />

Personnel Decisions Research Institute<br />

43 Main St., SE<br />

Suite #405<br />

Minneapolis, MN 55414<br />

USA<br />

COM: (612) 331-3680<br />

CAPT Johnnie C. Harris<br />

USAF Occupational Measurement Center<br />

Attn: OMVD<br />

Randolph AFB, TX 78150-5000<br />

USA<br />

.Mary Ellen Hartmann<br />

Questar Data Systems, Ince<br />

2905 West Service Road<br />

Eaqan, MN 55121-2199<br />

USA<br />

CDR Robert B. Hawkins<br />

Chief of Naval Education and Training<br />

Code N31T, Naval Air Station<br />

Pensacola, FL 32508-5100<br />

USA<br />

Dr. Charles W. .Hesse<br />

Naval Aviation and Training Program<br />

Management Support Activity (Code 313)<br />

Pensacola, FL 32509-5000<br />

USA<br />

A/V 922-1579 COM: (904) 452-1579


c _.. -.<br />

_ ..-___<br />

Mr. G. J. (Jeff) Higgs<br />

National Defence Headquarters<br />

101 Colonel By Drive (Attn: DMOS 3)<br />

Ottawa, Ontario KlA OX2<br />

Canada<br />

A/V 842-7069 COM: (613) 992-7069<br />

CAPT D. Wayne Hlntze<br />

Manpower Analysis, Eva1 & Coordination<br />

HO, U.S. Marine Corps (Code MA)<br />

Washington, DC 20380-0001<br />

USA<br />

A/V 224-4165 COM: (703) 614-4165<br />

Mr. Charles R. Hoshaw<br />

5920 Brookview Drive<br />

Alexandria, VA 22310<br />

USA<br />

COM: (703) 694-5511<br />

Janis S. Houston<br />

Personnel Decisions Research Institute<br />

43 Main St., S.E.<br />

Riverplace Suite #405<br />

Minneapolis, MN 55414<br />

USA<br />

COM: (612) 331-3680<br />

Dr. DeLayne R. Hudspeth<br />

College of Education (EDB406)<br />

The University of Texas<br />

Austin, TX 78712<br />

USA<br />

COM: (512) 471-5211<br />

Dr. Alain Hunter<br />

Technical Director<br />

NMPC DET NODAC<br />

Bldg. 150 WNY (Anacostia)<br />

Washington, DC 20374-1501<br />

USA<br />

A/V 288-4620 COM: (202) 433-4620<br />

Barbara A. Jezior<br />

U.S. Army Natick R,D,E Ctr-STRNC-YB<br />

Kansas Street<br />

Natick, MA 01760-5020<br />

USA<br />

A/V 256-5523 COM: (508) 651-5523<br />

.f<br />

577<br />

Wayne E. Keates<br />

Personnel Applied Research & Training<br />

Division - SNC Defense Products, Ltd.<br />

155 Queen St,, Suite 1302<br />

Ottawa, Ontario KlPCLl<br />

Canada<br />

COM: (613) 238-7216<br />

Dr. Robert S. Kennedy<br />

Vice President, Essex Corporation<br />

1040 Woodcock Road, #227<br />

Orlando, FL 32803<br />

USA<br />

COM: (407) 894-5090<br />

CDR Robert H. Kerr<br />

Canadian Forces Fleet School<br />

FM0 Halifax<br />

Nova Scotia B3K 2X0<br />

Canada<br />

A/V 447-8054 COM: ;(902) 427-8054<br />

Rex G. Kinder<br />

Rexton Consulting Services, Pty Ltd.<br />

P.O. Box 382, Manly<br />

NSW 2095<br />

Australia<br />

Robert W. King<br />

4055 Bedevere Dr.<br />

Pensacola, FL 32514<br />

USA<br />

A/V 922-1663 COM: (904) 452-1663<br />

Thomas P. Kirchenkamp<br />

Dornier GmbH<br />

P.O. Box 1420. Dept. WTWI<br />

D-7990 Friedrichshafen 1<br />

Federal Republic of Germany<br />

49-7545-5775<br />

Dr., Paul Klein<br />

Sozialwissenschaftliches Inst.<br />

Der Bundeswehr, Winzererstr.<br />

52D - 8000 Munchen 40<br />

089 12003233<br />

Wolf Knacke<br />

Streitkrafteamt (Armed Forces Office)<br />

- I 7 / Militarpsychologie -<br />

Postfach 20 50 03<br />

D- 5300 Bonn - 2<br />

Federal Republic Of GErmanY


Dr. John L. Kobrick<br />

US Army Research Institute<br />

of Environmental Medicine<br />

Kansas Street<br />

Natick, MA 01760<br />

USA<br />

A/V 256-4885 COM: (508) 651-4885<br />

Fay J. Landrum<br />

Naval Aviation and Training Program<br />

Managment Support Activity (Code 3161)<br />

Pensacola, FL 32509-5000<br />

USA<br />

A/V 922-1736 COM: (904) 452-1736<br />

Richard S. Lanterman<br />

U.S. Coast Guard HQ (G-PWP-2), Room 4111<br />

2100 Second St,, S.W.<br />

Washington, DC 20593-0001<br />

USA<br />

COM: (202) 267-2986<br />

Dr. James M. Lentz<br />

Naval Education and Training Program<br />

Management Support Activity (Code 301)<br />

Pensacola, FL 32509-5000<br />

USA<br />

A/V 922-1685 COM: (904) 452-1685<br />

Dr. Carl W. Lickteig<br />

U.S. Army Research Institute<br />

Field Unit Ft Knox (Attn: PERI-IK)<br />

2423 Morande Street<br />

Fort Knox, Ky 40121-5620<br />

USA<br />

A/V 464-7046 COM: (502) 624-7046<br />

COL Michael Lindquist<br />

<strong>Military</strong> Education Divlson J-7<br />

Pentagon, Room lA724<br />

Washington, DC 20318-7000<br />

USA<br />

Dr. Suzanne Lipscomb<br />

AFHRLIPRP<br />

Brooks AFB, TX 78235-5601<br />

USA<br />

Richard M. Lopez<br />

Naval Aviation and Training Program<br />

Management Support Activity (Code 3171)<br />

Pensacola, FL 32509-5000<br />

USA<br />

A/V 922-1357 COM: (904) 452-1357<br />

578<br />

Donald F. Lupone<br />

Naval Education and Training Program<br />

Management Support Activity (Code 315)<br />

Pensacola, FL 32509-5000<br />

USA<br />

A/V 922-1777 COM: (904) 452-1777<br />

Dr. Fred A. Mae1<br />

U.S. Army Research Institute<br />

5001 Eisenhower Avenue<br />

Alexandria, VA 22333-5600<br />

USA<br />

A/V 284-8275 COM: (202) 274-8275<br />

Dr. Rolland R. Mallette, Major (Retj<br />

Industrial Psychologist<br />

Ontario Hydro-700 University Ave (H3-G27<br />

Toronto, Ontario M5G 1X6<br />

Canada<br />

COM: (416) 592-7038<br />

LTC Ken A. Martell<br />

6308 Falling Brook Drive<br />

Burke, VA 22015<br />

USA<br />

A/V 225-456012225 COM: 202-695-4560<br />

Ms. Nora E. Matos<br />

Naval Education and Training Program<br />

Management Support Activity (Code 312)<br />

Pensacola, FL 32509-5000<br />

USA<br />

A/V 922-1388 COM: (904) 452-1388<br />

Dr. James R. McBride<br />

6430 Elmhurst Dr.<br />

San Diego, CA 92120<br />

USA<br />

COM: (619) 582-0200<br />

Dean C. McCallum<br />

Naval Education and Training Program<br />

Management Support Activity (Code 313)<br />

Pensacola, FL 32509-5000<br />

USA<br />

A/V 922-1648 COM: (904) 452-1648<br />

Donald E. McCauley, *Jr.<br />

Office of Rsch & Development<br />

Room 6451<br />

Office of Personnel Management<br />

Washington, DC 20415<br />

USA<br />

COM: (202) 606-0880


Deborah L. McCormick<br />

Chief,of Naval Technical Training<br />

Attn: N6211<br />

NAS Memphis<br />

Millington, TN 38054-5056<br />

USA<br />

A/V 966-5865 COM: (901) 873-5865<br />

Harold M. McCurry<br />

1919 Baldwin Brook Dr.<br />

Montgomery, AL 36116<br />

USA<br />

COM: (205) 279-5382<br />

Edward McFadden<br />

Atlanta <strong>Military</strong> Entrance Proce&ing St<br />

M.L. King Federal Annex; Ground Floor<br />

77 Forsyth Street SW<br />

Atlanta, GA 30303-3427<br />

USA<br />

Dr. Albert H. Melter<br />

Personalstammamt der Bundeswehr<br />

Koelner Str. 262<br />

D-5000 Koeln 90<br />

Federal Republic of Germany<br />

Central Personnel Office<br />

German Federal Armed Forces<br />

PH: (Germany) (02203) 12021 472<br />

MAJ Harold C. Mendes<br />

520 Larochelle<br />

Saint Jean-<br />

Quebec J3B lJ5<br />

Canada<br />

LT Mark R. Miller<br />

12474 Starcrest #210<br />

San Antonio, TX 78216<br />

USA<br />

A/V 240-3222 COM: (512) 536-3222<br />

William M. Minter<br />

.ECI/EDC<br />

U.S. Air Force<br />

Gunter AFB, AL 36118<br />

USA.<br />

A/V 446-4151 COM: (205) 279-4151<br />

579<br />

Dr. Angelo Mirabella<br />

U.S. Army Research Institute<br />

5001 Eisenhower Avenue<br />

Alexandria, VA 22333-5600<br />

USA<br />

A/V 284-8827 COM: (202) 274-8827<br />

Dr. Jimmy L. Mitchell<br />

McDonnel Douglas Missile Systems Co,<br />

8301 North Broadway, Suite 211<br />

San Antonio, TX 78109<br />

USA<br />

COM: (512) 826-8664<br />

William E. Montague<br />

Training Technology<br />

Navy Personnel R&D Center (Code 15A)<br />

San Diego, CA 92152-6800<br />

USA<br />

A/V 553-7849 COM: (619) 553-7849<br />

LCDR Tom Morrison<br />

Naval Aerospace Medical Institute<br />

Code 412<br />

Naval Air Station<br />

Pensacola, FL 32508-5600<br />

USA<br />

A/V 922-2615 COM: (904) 452-2615<br />

Dr. C. Jill Mullins<br />

Chief of Naval Education 6 Training<br />

N-11, Bldg. 628<br />

Naval Air Station<br />

Pensacola, FL 32538-5100<br />

USA<br />

A/V 922-4207 CC’M: (904) 452-4207<br />

James Gerald Murphy<br />

Naval Education and Training Program<br />

Management Support Activity (Code 03171<br />

Pensacola, FL 32509-5000<br />

USA<br />

A/V 922-1414 COM: (904) 452-1414<br />

CAPT Edward L. Naro<br />

Naval <strong>Military</strong> Personnel Command<br />

Navy Occupational Dev. d Analysis Center<br />

Bldg. 150, WNY, (Ai?,acQstia)<br />

Washington, DC 20374 -1501<br />

USA<br />

A/V 288-5488 COM: (202) 433-5488


Joe H. Neidig<br />

Naval Education and Training Program<br />

Management Support Activity (Code 3111)<br />

Pensacola, FL 32509-5000<br />

USA<br />

A/V 922-1729 COM: (904) 452-1729<br />

Ms. Mary L. Norwood<br />

U.C. Coast Guard HQ (G-PWP-21, Room 4111<br />

2100 Second St., SW<br />

Washington, DC 20593-0001<br />

USA<br />

Dr. Lawrence H. O'Brien<br />

Dynamics Research Corporation<br />

60 Concord Street<br />

Wilmington, MA 02174<br />

USA<br />

COM: (508) 658-6100<br />

Brian S. O'Leary<br />

US Office of Personnel Management<br />

Room 6451<br />

1900 E Street, N, W.<br />

Washington, DC 20415<br />

USA<br />

COM: (202) 606-0880<br />

Robert C. Pallme<br />

Naval Education and Training Program<br />

Management Support Activity (Code 311)<br />

Pensacola, FL 32509-5000<br />

USA<br />

A/V 922-1728 COM: (904) 452-1728<br />

Dr. Dale R. Palmer<br />

U.S. Army Research Institute<br />

5001 Eisenhower Avenue<br />

Alexandria, VA 22333-5600<br />

USA<br />

A/V 284-8275 COM: (703) 284-8275<br />

Stephen W<br />

NPRDC<br />

Code 15<br />

San Diego<br />

USA<br />

A/V 553-77<br />

Parchman<br />

CA 92152-6800<br />

94 COM: (619) 553-7794<br />

580<br />

Randolph Park<br />

American Institutes for Research<br />

3333 K St., NW<br />

Washington, DC<br />

USA<br />

COM: (202) 342-5000<br />

Dr. John J. Pass<br />

927 Nautilus Isle<br />

Dania, FL 33004<br />

USA<br />

Robert H. Pennington<br />

Naval Education and Training Program<br />

Management Support Activity (Code 314)<br />

Pensacola, FL 32509-5000<br />

USA<br />

A/V 922-1547 COM: (904) 452-1547<br />

Carlene M. Perry<br />

United States Air Forces Academy<br />

P.0, Box 4269<br />

US Air Force Academy, CO 80841<br />

USA<br />

COM: (719) 472-4551<br />

Dr. Mark G. Pfeiffer<br />

NAVTRASYSCEN Training Analysis & Evaluat<br />

Code 121<br />

12350 Research Parkway<br />

Orlando, FL 32826-3224<br />

USA<br />

A/V 960-4132 COM: (407) 380-4132<br />

William J. Phalen<br />

AFHRL/MOD<br />

Brooks AFB. TX 78235-5600<br />

USA<br />

A/V 240-3677 COM: (512) 536-3677<br />

Squadron Leader John S. Price<br />

Royal Australian Air Force<br />

U.S. Air Force Human Resources<br />

Laboratory (AFHRL/MOD)<br />

Brooks AFB, TX 78235-5000<br />

USA<br />

A/V 240-3648 COM: (512) 536-3648


COL Terry J. Procluk<br />

Director of Personnel Psychology and<br />

sociology<br />

National Defense Headquarters<br />

Ottawa, Ontario KlA OK2<br />

Canada<br />

A/V 042-0244 COM: (613) 992-0244<br />

Dr. Wlebke Putz-Osterloh<br />

Lehrstuhl Psychologle<br />

Universitate Bayreuth<br />

Postfach 10151., D 8580 Bayreuth<br />

Federal Republic of Germany<br />

University of Bayreuth<br />

(0921) 55700<br />

Martin L. Rauch<br />

Federal Ministry of Defense, P II 4<br />

Postfach 1328<br />

5300 Bonn 1<br />

Federal Republic of Germany<br />

COM: 49-228-128543<br />

LT Daniel T. Reeves<br />

Canadian Forces<br />

Personnel Applied Research Unit<br />

4900 Yonge St., Suite 600<br />

Willowdale, Ontario, M2N 6B7<br />

Canada<br />

A/V 027-4239 COM: (416) 224-4968<br />

Beatrice Julie Rheinstein<br />

Office of Personnel Management<br />

1900 E Street, Room 6451<br />

Washington, DC 20415-5000<br />

USA<br />

COM: (202) 606-2694<br />

William M. Ritchie<br />

National Defense Headquarters<br />

101 Colonel By Drive<br />

Ottawa, Ontario KlA OK2<br />

Canada<br />

Dr. Gwyn Robson<br />

Marine Corps Institute<br />

P. 0. Box 1775<br />

Arlington, VA 22222-0001<br />

USA<br />

A/V 288-4109 COM: (202) 433-4109<br />

581<br />

Dr. Gerd Rode1<br />

Freiwllligenannahmezentrale der Marine<br />

Ebkeriege 35191<br />

D-2940 Wilhelmshaven<br />

Federal Republic of Germany *<br />

COM: (04421) 792124<br />

Earl F. Roe<br />

Naval Education and Training Program<br />

Management Support Activity (Code 0317)<br />

Pensacola, Fl 32509-5000<br />

USA<br />

A/V 922-1335 COM: (904) 452-1335<br />

CIC Diane L. Romaglia<br />

United States Air Force Academy<br />

P.O. Box 4405<br />

U.S. Air Force Academy, CO 80841<br />

USA<br />

A/V 259-4537 COM: (719) 472-4533<br />

Kendall L. Roose<br />

Training Department<br />

Training Air Wing Five<br />

NAS Whiting Field<br />

Milton, FL 32570-5100<br />

USA<br />

A/V 868-7266 COM: (904) 623-7266<br />

COL Dr. Ger J.C. Roozendaal<br />

DPKL/afd GW<br />

Postbus 90701<br />

2509 LS The Haque<br />

The Netherlands<br />

COM: 31-71-6135450<br />

Sandra A. Rudolph<br />

Chief of Naval Technical Training<br />

Bldg C-l, Code N632<br />

NAS Memphis<br />

Millington, TN 38054-5056<br />

USA<br />

A/V 966-5591 COM: (901) 873-5591<br />

Roberto B. Salinas<br />

USAFOMSQ/OMYO<br />

Randolph AFB, TX 78150<br />

A/V 487-6811 COM: (512) 652-6811


MAJ Charles A, Salter<br />

Natick Research, Dev. h Eng. Center<br />

10 East Militia Hts.<br />

Needham, MA 02192<br />

USA<br />

A/V 256-4901 COM: (508) 651-4901<br />

Mr. William A. Sands<br />

Navy Personnel Research and<br />

Development Center (NPRDC)<br />

<strong>Testing</strong> Systems Department (Code 13)<br />

San Diego, CA 92152-6800<br />

USA<br />

A/V 553-9266 COM: (619) 553-9266<br />

Jerry Scarpate<br />

DEOMI/DR<br />

Patrick AFB, FL 32925-6685<br />

USA<br />

Sibylle B. Schambach<br />

c/o Federal Armed Forces Admin Office<br />

Bonner Talweg 177<br />

D-5300 Bonn<br />

Federal Republic of Germany<br />

PH: (Germany) 228-122099<br />

Dr. Amy C. Schwartz<br />

U.S. Army Research Institute<br />

5001 Eisenhower Avenue<br />

Alexandria, VA 22333<br />

USA<br />

A/V 284-0275 COM: (703) 274-0275<br />

LCDR James W. Shafovaloff<br />

Commandant (G-PWP-2)<br />

U.S. Coast Guard Headquarters<br />

2100 2nd St., S.W., Room 4111<br />

Washington, DC 20593-0001<br />

USA<br />

FTS 8-267-1954 COM: (202) 267-1954<br />

Dr. Joyce Shettel-Neuber<br />

NPRDC<br />

San Diego, CA 92152-6800<br />

USA<br />

A/V 553-7940<br />

Dr. Guy L. Siebold<br />

US Army Research Institute, (PERI-IL)<br />

5001 Eisenhower Avenue<br />

Alexandria, VA 22330-5600<br />

USA<br />

A/V 204-8293 COM: (703) 274-0293<br />

.<br />

582<br />

Brian W. D. Slack<br />

Ontario-Hydro<br />

P.O. Box 338<br />

Orangeville,<br />

Ontario L9W 227<br />

Canada<br />

COM: (519) 941-4620<br />

LT Wilfried A. Slowack<br />

Bruynstraat<br />

CRS Set Psy Ond<br />

Bruynstraat B-1120<br />

Brussels<br />

Belgium<br />

PH: (Belgium) 02-2680050, Ext. 3279<br />

Dr. Robert M. Smith<br />

Naval Aviation Schools Command<br />

Code 12, Bldg. 633, Room 137<br />

Naval Air Station<br />

Pensacola, FL 32500-5400<br />

USA<br />

A/V 922-4120 COM: (904) 452-4120<br />

Dr. J. Michaei Spector<br />

AFHRLfIDC<br />

Brooks AFB, TX 78235-5601<br />

USA<br />

A/V 240-3036 COM: !512j 536-3036<br />

Yvonne W. Squires<br />

10968 Portobelo Dr.<br />

San Diego, CA 92124<br />

USA<br />

A/V 553-8264 COM: (619) 553-0264<br />

Herb C. Stacy<br />

Chief of Naval Technical Training<br />

Bldg C-l, Code N5A<br />

NAS Memphis<br />

Millington, TN 38054-5056<br />

USA<br />

A/V 966-5984 COM: (901) 873-5984<br />

Michael R. Staley<br />

7521 126th Avenue<br />

Kirkland, WA 98033<br />

USA<br />

COM: (206) 869-5501


Paul P. Stanley.11<br />

USAFOMC/OMD<br />

Randolph AFB, TX<br />

18150-5000<br />

USA<br />

A/V 481-5234 COM: (512) 652-5234<br />

Dr. Alma G. Steinberg<br />

U.S. Army Research Institute<br />

Attn: PERI-IL<br />

5001 Eisenhower Avenue<br />

Alexandria, VA 22333-5600<br />

USA<br />

A/V 284-8293 COM: (103) 214-8293<br />

Stanley D. Stephenson<br />

Dept of Computer Information<br />

System & Admin Science<br />

Southwest Texas State Univ.<br />

San Marcos, TX 18666-4616<br />

USA<br />

COM: (512) 245-2291<br />

Dr. Lawrence J. Strlcker<br />

Educational <strong>Testing</strong> Service<br />

Princeton, NJ 08541-0001<br />

USA<br />

COM: (609) 134-5551<br />

J, S. Tartell<br />

USAF OMSQ/OMY<br />

Randolph AFB, TX 18150-5000<br />

USA<br />

DSN 481-6623 COM: (512) 652-6623<br />

John W. Thaln<br />

583 Cypress<br />

Monterey, CA 93940<br />

USA<br />

A/V 818-5164 COM: (408) 641-5764<br />

William J. Tharion<br />

Health & Performance Division<br />

USA Research Inst of Env Med<br />

Natick, MA 01160-5001<br />

USA<br />

A/V 256-4115 COM: (508) 651-4115<br />

Philip A. Thornton<br />

U.S. Coast Guard HQ (G-PWP-21, Room 4111<br />

2100 Second Street S.W.<br />

Washington, DC 20593-0001<br />

USA<br />

COM: (202) 261-1954<br />

583<br />

LCDR Barbara T. Transki<br />

Navy Occupational Dev d Analysis Center<br />

Bldg 150. WNY Anacostia<br />

Washington, DC 20374-1501<br />

USA<br />

A/V 288-4633 COM: (202) 433-4633<br />

Thomas Trent<br />

<strong>Testing</strong> System Dept., Code 132<br />

Navy Personnel R&D Center<br />

San Diego, CA 92152-6800<br />

USA<br />

A/V 553-7637 COM: (619) 553-1631<br />

Ms. Susan Truscott<br />

Dir of Social & Economic Analysis<br />

Operational Research/Analysis Estabiish<br />

101 Colby Drive<br />

Ottawa, Ontario KlA OK2<br />

Canada<br />

Dr. James W. Tweeddaie<br />

Chief of Naval Education and Training<br />

NROTC Division<br />

NAS Pensacola<br />

Pensacola, FL 32508<br />

USA<br />

A/V 922-4983 COM: (904) 452-4903<br />

Dr. Lloyd W. Wade<br />

Special Programs Department<br />

Marine Corps Institute<br />

Arlington, VA 22222-0001<br />

USA<br />

A/V 288-2612 COM: (202) 415-9229<br />

Dr. Raymond 0. Waldkoetter<br />

U.S. Army Soldier Spt Center<br />

Attn: ATSG-DDN iBldg 401)<br />

Fort Harrison, IN 46216-5700<br />

USA<br />

A/V 699-3819 COM: (311) 542-3879<br />

Aubrey E. Walker<br />

U.S. Army Infantry School<br />

Attn: ATSH-TDT-I<br />

Fort Bennlng, GA 31905-5593<br />

USA<br />

Clarence L. Walker<br />

Rt. 1, Box 593<br />

Purcellville, VA 22132<br />

USA<br />

COM: (103) 669-6427


Dr. Brian K. Waters<br />

HumRRO<br />

1100 South Washington Street<br />

Alexandria, VA 22314-4499<br />

USA<br />

COM: (703) 706-5647<br />

Johnny J. Weissmuller<br />

Metrica, Inc.<br />

8301 Broadway, Suite.215<br />

San Antonio, TX 78217<br />

USA<br />

COM: (512) 822-6600<br />

LTCOL Karol W,J. Wenek<br />

<strong>Military</strong> Leadership & Management Dept<br />

Royal <strong>Military</strong> College of Canada<br />

Kingston, Ontario KlK 5L0<br />

Canada<br />

COM: (613) 541-6304<br />

_-.-._<br />

James D. Wiggins<br />

Naval Education and Training Program<br />

Management Support Activity (Code 3104)<br />

Pensacola, FL 32509-5000<br />

USA<br />

A/V 922-1323 COM: (904) 452-1323<br />

584<br />

CDR Frederick F.P. Wilson<br />

Canadian Forces Personnel Applied<br />

Research Unit<br />

4900 Yonge St., Suite 600<br />

Willowdale, Ontario M2N224<br />

Canada<br />

COM: (416) 224-4964<br />

Dr. Lauress L. Wise<br />

DMDC<br />

99 Pacific St., #155A<br />

Monterey, CA 93940-2453<br />

USA<br />

COM: (408) 655-4000<br />

Dr. Martin F. Wiskcff<br />

307A Mar Vista Drive<br />

Montery, CA 93940<br />

USA<br />

COM: (408) 373-3073<br />

Darrell A. Worstine<br />

Commander, USAPIC<br />

Attn: ATNC-MO<br />

200 Stovall Street<br />

Alexandria, VA 22332-1330<br />

USA<br />

A/V 221-3250 COM: (703) 325-3250<br />

Timothy C. Zello<br />

U.S, Army Ordnance Center<br />

Attn: ATSL-MD<br />

Aberdeen Proving Ground, MD 21005-5201<br />

USA<br />

A/V 298-4115 COM: (301) 278-4115


Abrahams, N. M., 486<br />

Albert, W, G., 310, 316<br />

Alderks, C. E., 432<br />

Alley, F., 292<br />

Arabian, J. M., 226<br />

Arndt, K,, 104<br />

Ashworth, MAJ R. L., Jr., 199<br />

Baker, H., 292, 298, 304<br />

Banderet, L. E., 334, 339<br />

Bayes, A. H., 425<br />

Bennett, W. R., 116<br />

Bergquist, Maj T. M., 156 i<br />

Bessemer, D. W,, 150<br />

Borman, W. C., 268, 492, 498, 504<br />

Bosshardt, M., 504, 505, 516<br />

Bowler, E. C., 535<br />

Bradley, Capt. J. P., 262<br />

Brady, E. J., 322<br />

Brooks, J. T., 541<br />

Brown, G. C., 553<br />

Buck, L. S., 274<br />

Buckenmyer, D. V., 116<br />

Burch, R. L., 486<br />

Busciglio, Henry H., 380<br />

Campbell, C. H., 528, 541<br />

Campbell, R. C., 528, 529, 541<br />

Carle, W. J., 541<br />

Clark, H. J., 460<br />

Collins, D. D., 414<br />

Conner, Dr. H. B., 312<br />

Crafts, J. L., 535<br />

Crawford, K., 504, 516<br />

Crawford, R. L., PhD, 167<br />

Cymerman, A., 408<br />

Dart, 1Lt T. S., 156<br />

Dauphinee, SSG D. T., 339<br />

Dempsey, J. R., 25<br />

Dhammanungune, S., 304<br />

Dlehl, G. E., 128<br />

Dittmar, Me J., 316<br />

Doyle, E. L. ,529<br />

Dubois, D., 504, 505, 516<br />

Dunlap, W. P., 220<br />

Edwards, J. E., 31, 486<br />

Eggenberger, J; C., PhD, 167<br />

Elig, T. W., 19<br />

Ellis, J. A., 132<br />

Evans, R. M.? 191<br />

Exner, Maj P. J., 535<br />

Fayfich, P. R., 70<br />

Fehler, F., 180<br />

Felker. D. B. , 535<br />

- -<br />

INDEX OF AUTHORS<br />

585<br />

Fiedler, E., 392<br />

Finley, D. L., 94, 99<br />

Fowlkes. J. E., 220<br />

Goldberg, E. L., 474<br />

Greene, C. A., 241<br />

Guthals, 2Lt J. A., 76, 156<br />

Hand, D. K., 82, 316<br />

Hansen, H. D., 351<br />

Hanson, M. A., 268, 498<br />

Harris, D. A., 25<br />

Harris, J. C., 547<br />

Harris, J. H., 528<br />

Hawkins R. B<br />

Heslin 'Captain ;8lF 174<br />

Houston, J., 504,'52;'<br />

Hoyt, R., 408<br />

Hudspeth, Dr. D. R., 70<br />

Ince, V,, 241<br />

Jezior, B. A., 241<br />

Johnson, R. F., 210<br />

Jones, M. B., 419<br />

Jones, P. L., 122<br />

Kennedy. R. S., 220, 419<br />

Kittredge, R., 408<br />

Klein, P., 88<br />

Knight, J. R.. li6<br />

Kobrick, J. L., 210<br />

Koger, Major M. E., 174<br />

Laabs, G. J., 398<br />

Leaman, J. A., 455<br />

Lescreve, F., 216<br />

Lesher, L. L., 241<br />

Lester, L. S.. 280<br />

Lickteig, C. W., 174<br />

Lieberman, H. R.. 334<br />

Lindsay, T. J., 438<br />

Luisi, T. A., 280<br />

Luther, S. M., 280<br />

Mael, F. A., 286<br />

Marlowe, B. E., 408<br />

Martell, LTC K. A., 6<br />

Mayberry. P. W., 535<br />

McCauley, Jr., D. E., 51, 58, 64<br />

McCormick, D. L., 122<br />

McGee, S. D., 404<br />

McMenemy. D. J.. 210<br />

Melter, A. H., 35?<br />

Menchaca, Capt J.* .Jr., 76<br />

Mentges, W. 357<br />

Mirabella, A. .I62<br />

Mitchell,


Muraida, D. J. 185<br />

O’Brien, L. H. 251<br />

O’Leary, B. S. 51, 58, 64<br />

O’Mara, M. 339<br />

Olivier, L. 76<br />

Owens-Kurtz, C, K. 492<br />

Palmer, D. R. 328<br />

Parchman, S. W. 132<br />

Paullin, C. 498<br />

Perez, CPT P. J. 334<br />

Perry, C. M. 235<br />

Pfeiffer, G. 76<br />

Pfeiffer, M. G. 191<br />

Phalen, W. 3. 82, 310, 316<br />

Phelps, Dr. R. H. 199<br />

Pimental, N. A. 339<br />

Popper, R. 241<br />

Price, J. S., Squadron Leader 70<br />

Putz-Osterloh, W. 362<br />

Quenette, M. A. 398<br />

Reeves, Lt(N) D. T. 12<br />

Rheinstein, J. 51, 58, 64<br />

Riley, SGT R. H. 339<br />

Rodel, G. 368<br />

Romaglla, CIC D. L. 345<br />

Roozendaal, Col. G. J. C. 466<br />

Rosenfeld, P. 31<br />

Rudolph, S. A. 204<br />

Rumsey, M. G. 322<br />

Rushano, T. M. 386<br />

Russell, T. L. 492<br />

Salter, MAJ C. A. 280<br />

Sands, M. 298<br />

Sands, W. A. 245<br />

Schambach, S. B. 110<br />

Schwartz, A, C. 226, 256<br />

Sheposh, J. P. 474<br />

Sherman, F. 504, 522<br />

Shettel-Neuber, J. 474<br />

INDEX OF AUTHORS<br />

586<br />

Shukitt-Hale, B. L. 334<br />

Siebold, G. L. 438, 444<br />

Silva, J. M. 256<br />

Simpson, LTC R. L. 314<br />

Skinner, J. 345<br />

Slowack. W. 216<br />

Spector, .J. M. 185<br />

Spier, M. 304<br />

Spokane, A. 298<br />

Stanley II, P. P. 235, 547<br />

Steinberg, A. G. 455<br />

Stephenson, J. A. 138<br />

Stephenson, S. D. 136. 144<br />

Swirski, L. 292..:jOs<br />

Tartell, J. S. 547<br />

Thain, J. W. 231<br />

Tharion, W. J. 408<br />

Thomas, P. J. 31<br />

Toyota, SGT R. M. 339<br />

Trent, T. 398<br />

Truscott, 5. !<br />

Turnage, J. 1. 229, 4 1 3<br />

Tweeddale, J. W. 480<br />

Vandivier, P. L. 453<br />

Van Hemel, S. 252<br />

Vaughan, D. S. 116<br />

Waldkoetter, R. 0. 450<br />

Walker, C. L. 37<br />

Waters, B. K. 25<br />

White, L. A. 328<br />

White, W. R., Sr. 450<br />

Williams, J. E. 235<br />

Winn, LTC D. H. 6<br />

Wiskoff, M. 594, fC5, 51i, 522<br />

witt, SSG c. E . .!33<br />

York, W. .J., Jr, 94, 39<br />

Young, M. C. 328<br />

Zimmerman, R. A. 504, 5il<br />

_--._--.- __.__ -I_ -___ -__--.--. -1

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!