I__. - International Military Testing Association
I__. - International Military Testing Association
I__. - International Military Testing Association
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
32nd ANNUAL CONFERENCE<br />
OF THE<br />
MILITARY TESTING ASSOCIATION<br />
Orange Beach, Alabama<br />
5 - 9 November 1990<br />
Proceedings<br />
Hosted by the<br />
Naval Education anand Training<br />
Program Management Support Activity<br />
.
32nd A N N U A L C O N F E R E N C E O F T H E<br />
M I L I T A R Y T E S T I N G A S S O C I A T I O N<br />
H o s t e d b y the<br />
N a v a l E d u c a t i o n a n d T r a i n i n g P r o g r a m<br />
M a n a g e m e n t S u p p o r t A c t i v i t y<br />
O r a n g e Beac!h, A l a b a m a<br />
5 - 9 N o v e m b e r 1 9 9 0<br />
.<br />
.
32nd Annual Conferenae of the <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong><br />
Chairperson ,J<br />
Conference Coordinator<br />
Hosted by the<br />
Naval Education and Training Program<br />
Management Support Activity<br />
Orange Beach, Alabama<br />
5 9 November 1990<br />
Conference Committee<br />
Chair, Program and Publications Subcommittee<br />
Chair, Facilities Subcommittee<br />
Chair, Registration Subcommittee<br />
Chair, Social Subcommittee<br />
Chair, Public Relations Subcommittee<br />
Chair, Memento Subcommittee<br />
Chair, Finance Subcommittee<br />
Site Coordinator<br />
i<br />
Commander Mary A. Adams<br />
Mr. Robert King<br />
Mr, Donald Lupone<br />
Mr. William Adams<br />
Mr. Richard Lopez<br />
Mr. Robert Pallme<br />
Mr, David Slover<br />
Mr. Dean McCallum<br />
LT Gary L. Waters<br />
Dr. Charles Hesse
Acknowledgement8<br />
The 8UCe86 O f the MTA Conference can be attributed to the<br />
dedication of individual8 who worked many hours. The MTA<br />
Conference Committee members express their appreciation to the<br />
following people for their contribution8 to the Conference:<br />
Pailities<br />
Mr. William Adams (Chair)<br />
DMC Charles Alvare<br />
Ms. Sharon Benton<br />
CMCS Thomas A. Browning :<br />
LICM Robert Carr<br />
Mr. Dale Eckard<br />
Mr. Al Farr<br />
PHC Carl Hinkle<br />
MS, Jackie Hufman<br />
CECS Billy F. Johnson<br />
CECS John A. Lanclos<br />
Ms. Fay Landrum<br />
Mr. Frank Strayer<br />
Finance<br />
LT Gary L. Waters<br />
Memento<br />
Mr. Dean McCalum (Chair)<br />
Ms. Catherine Warfield<br />
Presentation Facilitator8<br />
Mr. Gerald Murphy (Chair)<br />
Ms. Sharon Benton<br />
FTCS Robert Bloomquist<br />
AWCS David M. Devarney<br />
ETCS R. Elliott<br />
OTACS Robert H, Howe<br />
RPC Jeffery L. Krlngle<br />
GSCS Robert Kuzirlan<br />
RPC Frank Logan<br />
CWO Camilo D. Lomibao<br />
OTAC Mark A. Lowe<br />
JOC George Markfelder<br />
VNC Gail M. Ravy<br />
MRC Kenneth Shaw<br />
AKCS William Sims<br />
IV<br />
Proaram and Publications<br />
Mr. Donald Lupone (Chair) . -.<br />
Mr. W. N. Presley Jr.<br />
Ms. Wilma Scofield<br />
Ms. Joanne Vendetti<br />
Publa Relations<br />
Mr. David Slover (Chair)<br />
ETC Steve Anderson<br />
DMC Charles Alvare<br />
SMC Vie Barera<br />
Mr. Dave Bodin<br />
Maxwell Buchanan<br />
Mr. Norman Champagne<br />
Code 05 Department<br />
ATCS Joel Garner<br />
Mr. Frank Harwood<br />
MUCM David Johnson<br />
Mr. Don Phillips<br />
yN3 Mark Shinkle<br />
AXCS Gary Spoon<br />
Mr. Donald Wiggins<br />
Mr. Emery Williams<br />
Ms. Mary Wing<br />
Resistration<br />
Mr. Richard Lopez (Chair)<br />
Mr. Earl F. Roe<br />
Mr. Michael Abney<br />
ISC P. Buchan<br />
STGCS P. D, Craig<br />
Mr. Ronald Dougherty<br />
Ms. Brenda Frederick<br />
Ms. Susan Godwln<br />
Mr. Larry Goldlng<br />
STGC J. M. Griffin<br />
Ms. Debbie Halberg<br />
RMC C. I. Hannah<br />
FTCS R. Langley
Registration (Continued)<br />
RMC M, McKay<br />
AWC M. A. Morris<br />
OTMC W. E. Parsons<br />
AWC T. T. Pearson<br />
MS, Jane Reich<br />
Ms. Laura Roberts<br />
Ms. Anne Sayers<br />
ISCM T. Schroeder<br />
STGC E. C. Smith<br />
AWCM J. R. Thompson<br />
Ms. Marjorie Warsing<br />
STSC J. C. Whitaker<br />
Ms. Jo Ellen Wolf<br />
OTMCS R. A, Wood<br />
FTCM M. Young<br />
V<br />
Site Coordinator<br />
Dr. Charles Hesse<br />
Social<br />
Mr, Robert Palme (Chair)<br />
GMCS Ricardo Andres<br />
Ms. Ginger Andrews<br />
Ms. Nora Matos<br />
Mr. Joseph Neidlg<br />
Mr. Charles Warner
FORWORD<br />
These Proceeding6 of the 32nd Anual Conference of the <strong>Military</strong><br />
<strong>Testing</strong> ASOCiatiOn document the pr666ntatiOnS given at paper and<br />
panel 6e6iOn6 during the conference. The papers represent a<br />
broad range of topics by contributors from the military,<br />
industrial, and educational comunltles, both foreign and<br />
domestic. It should be noted that the papers reflect the<br />
opinion6 of the author6 and do not necessarily reflect the<br />
official policy of any Institution, government, or armed service.. .<br />
V i
TABLE OF CONTENTS<br />
1990 CONFERENCE COMMITTEE ..................................<br />
ACKNOWLEDGEMENTS ...........................................<br />
FOREWORD ...................................................<br />
TABLE OF CONTENTS ..........................................<br />
OPENING SESSION............................................<br />
PAPER PRESENTATIONS - MANPOWER<br />
101.<br />
102.<br />
103.<br />
104.<br />
105.<br />
106.<br />
107.<br />
108 1<br />
TRUSCOTT, S., The' Canadian Reserves: Current and<br />
Future Manpower..................,...,~..............<br />
MARTELL, LTC Kenneth A. and WINN, LTC Dennis H..<br />
Accession Dynamics...................................<br />
Not Presented.<br />
REEVES, Liz(N) D. T., Ethnic Participation in the<br />
Canadian Forces: Demographic Trends....... . . . . . . . . . .<br />
ELIG, Timothy.W., 1990 Army Career Satisfaction<br />
Survey...............................................<br />
DEMPSEY, J. R., HARRIS, D. A., and WATERS, A. K., The<br />
Use of Artificial Neural Networks in <strong>Military</strong><br />
Manpower Modeling.....,.....,..,...,~................<br />
EDWARDS, Jack E., ROSENFELD, Paul, and THOMAS,<br />
Patricia J., Hispanics in Navy's Blue-Collar Civilian<br />
Workforce: A Pilot Study............................<br />
Not Presented.<br />
PAPER PRESENTATIONS - OCCUPATIONAL ANALYSIS<br />
201. WALKER, C. L., Descriptors of Job Specialization<br />
Based on Job Knowledge Tests.........................<br />
202. RHEINSTEIN, Julie, O'LEARY, Brian S., and MCCAULEY,<br />
Jr., Donald E., Addressing the Issues of<br />
"Quantitative Overkill" in Job Analysis. . . . . . . . . . . . . .<br />
203. O'LEARY, Brian S., RHEINSTEIN, Julie, and MCCAULEY,<br />
Jr., Donald E., Developing Job Families Using<br />
Generalized Work Behaviors................ . . . . ,.. . ..,<br />
vii<br />
iii<br />
iV<br />
vi<br />
Vii<br />
xvi<br />
i<br />
6<br />
12<br />
19<br />
25<br />
3 1<br />
37<br />
51<br />
SR
204.<br />
205.<br />
206.<br />
207.<br />
208.<br />
209.<br />
210.<br />
211.<br />
212.<br />
213.<br />
214,<br />
O'LEARY, Brian S., RHEINSTEIN, Julie, and MCCAULEY<br />
Jr., Donald E., A Comparison of Holistic and<br />
Traditional Job-Analytic Methods................... . .<br />
HUDSPETH, Dr. DeLayne It., FAYFICH, Paul R., and<br />
PRICE, John S., Squadron Leader, Automating the<br />
Administration of USAF Occupational Surveys.. . . . . . . . .<br />
MENCHACA, Capt Jose, Jr., GUTHALS, 2Lt Jody A.,<br />
OLIVIER, Lou, and PFEIFFER, Glenda, MPT Enhancements<br />
to the Occupational Research Data Bank.. . . . . . . . . . . . . .<br />
PHALEN, William J., MITCHELL, Jimmy L., and HAND,<br />
Darryl K., ASCII CODAP: Progress Report on<br />
Applications of Advanced Analysis Software...........<br />
KLEIN, Paul, Professional Success of Former Officers<br />
in Civilian Occupations...........,..................<br />
FINLEY, Dorothy L. and YORK, William J., Jr., A<br />
<strong>Military</strong> Occupational Specialty (MOS) Research and<br />
Development Program: Goals and Status...............<br />
YORK, William J., Jr. and FINLEY, Dorothy L.,<br />
Application of the Job Ability Assessment System to<br />
Communication Systems Operators...,.. ,,.......,......<br />
ARNDT, K., Preferences for <strong>Military</strong> Assignments in<br />
German Conscripts...,...........,..,,,,,..,.........;<br />
SCHAMBACH, S. B., Aptitude-Oriented Replacement of<br />
Conscript Manpower in the German Bundeswehr.... . . . . . .<br />
VAUGHAN, David S., MITCHELL, Jimmy L., KNIGHT, J. R.,<br />
BENNETT, Winston R., and BUCKENMYER, David V.,<br />
Developlng a Training Time and Proficiency Model for<br />
Estimating Air Force Specialty Training Requirements<br />
of New Weapon Systems...................,.,,,,,., ,,.,.<br />
Not Presented.<br />
PAPER PRESENTATIONS - TRAINING<br />
301. MCCORMICK, D. L. and JONES, P. L., Evaluating<br />
Training Program Modifications.......................<br />
302. Not Presented.<br />
303. Not Presented.<br />
304. DIEHL, Grover E., The Effect of Reading Difficulty on<br />
Correspondence Course Performance. ,.,,....,. . . . . . . . . .<br />
V i i i<br />
64<br />
70<br />
76 -'<br />
82<br />
88<br />
94<br />
99<br />
104<br />
110<br />
116<br />
122<br />
128
305.<br />
306.<br />
307.<br />
308.<br />
309 *<br />
310.<br />
311.<br />
312.<br />
313.<br />
314.<br />
315.<br />
316.<br />
317.<br />
318.<br />
319.<br />
PARCHMAN, Steve W., ELLIS, John A., and MONTAGUE,<br />
William E., Navy Basic Electricity Theory Training:<br />
Past, Present, and Future............. . . . . . . . . . . . . . . .<br />
Not Presented.<br />
Not Presented.<br />
STEPHENSON, S. D. and STEPHENSON, J. A., Using Event<br />
History Techniques to Analyze Task Perishability: A<br />
Simulation, . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .., . . . . .<br />
STEPHENSON, S. D., A First Look at the Effect of<br />
Instructor Behavior in a Computer-Based Training<br />
Envlronment..........,,.......,,,,,..,..,............<br />
BESSEMER, D. W., Transfer of Training with Networked<br />
Simulators.............,..,,.,.,.,....,. ,.....,...,..<br />
Not Presented.<br />
DART, 1Lt Todd S., GUTHALS, 2Lt Jody A., and<br />
BERGQUIST, Maj Timothy M., Contingency Task Training<br />
Scenario Generator...............................,...<br />
MIRABELLA, Angelo, Cooperative Learning in the Army:<br />
Research and Application...... . . . . . . . . . . . . . . . . . . . . . . .<br />
EGGENBERGER. J. C., PhD, and CRAWFORD, R. L., PhD,<br />
Battle-Task/Battleboard Training Application Paradigm<br />
and Research Design.............,...,.............,..<br />
LICKTEIG, Carl W., KOGER, Major Milton E., and<br />
HESLIN, Captain Thomas F., Combat Vehicle Commander's<br />
Situational Awareness: Assessment Techniques........<br />
FEHLER, F., An Aviation Psychological System for<br />
Helicopter Pilot Selection and Training.. . . . . . . . ...,.<br />
SPECTOR, J. M. and MURAIDA. D. J., Analyzing User<br />
Interaction with Instructional Design Software. . . . . . .<br />
PFEIFFER, M. G. and EVANS, R. M., Forecasting<br />
Training Effectiveness (FORTE)... . . . . . . . . . ,., ..‘.....<br />
PHELPS, Dr. Ruth H. and ASHWORTH, MAJ Robert L., Jr.,<br />
Cost-Effectiveness of Home Study Using Asynchronous<br />
Computer Conferencing for Reserve Component<br />
Training.............................................<br />
132<br />
138<br />
144<br />
150<br />
156<br />
162<br />
161<br />
174<br />
180<br />
185<br />
191<br />
199
- TESTING<br />
401. RUDOLPH, Sandra A., Test Design and Minimum Cutoff<br />
Scores...........,......,,.,,.,,.,,,,,.......,. ,..... 204<br />
402. KOBRICK, J. L., JOHNSON, R. F., and MCMENEMY, D. J.,<br />
Subjective and Cognitive Reactions to Atropine/2-PAM,<br />
Heat, and BDU/MOPP-IV................,.,.,........... 210<br />
403. LESCREVE, F. and SLOWACK, W., Guts: A Belgian Gunner<br />
<strong>Testing</strong> System.......................,............... '216 .<br />
404. Not Presented.<br />
405. Not Presented.<br />
J.<br />
406. KENNEDY, R. S., DUNLAP, W. P., FOWLKES, J. E., and<br />
TURNAGE, J. J., Characterizing Responses to Stress<br />
Utilizing Dose Equivalency Methodology............... 220<br />
407. Not Presented.<br />
408. ARABIAN, Jane M. and SCHWARTZ, Amy C., Job Sets for<br />
Efficiency In Recruiting and Training (JSERT)... ‘., . . 226<br />
409. THAIN, John W., Development of a New Language<br />
Aptitude Battery........................,............ 231<br />
410. WILLIAMS, J. E., STANLEY, P. P., and PERRY, C. M.,<br />
Implementation of Content Validity Ratings In Air<br />
Force Promotion Test Construction................. . . . 235<br />
411. JEZIOR, B, A., POPPER, R., LESHER, L. L., GREENE, C.<br />
A., and INCE, V., Interpreting Rating Scale Results:<br />
What Does a Mean Mean?.....................,... ,,.,,. 241<br />
412. SANDS, W. A., Joint-Service Computerized Aptitude<br />
Testlng............................,,................<br />
413. Not Presented.<br />
414. O'BRIEN, L. H., Assessment of Aptitude Requirements<br />
for New or Modified Systems................,.,.,...,, 251<br />
415. Presented in Symposium 803D.<br />
416. SCHWARTZ, Amy C. and SILVA, Jay M., The Practical<br />
Impact of Selecting Tow Gunners with a Psychomotor<br />
Test..~.,.,.........,,,....,,..,,......,..............<br />
417. BRADLEY, Capt. J. P., Validation of a Naval Officer<br />
Selection Board..................,.,...............,. 262<br />
X<br />
245<br />
256
418. Not Presented.<br />
419. HANSON, Mary Ann, and BORMAN, Walter C.,<br />
A Situational Judgment Test of Supervisory<br />
Knowledge in'the U..S. Army............................ 268<br />
420. Presented in Symposium 803B.<br />
421. Not Presented.<br />
422. BUCK, Lawrence S., Context Effects on Multiple-<br />
Choice Test Performance... . . . . . . . . . . . . . . . . . . . . . . . . . . . 214<br />
423. SALTER, MAJ Charles A., LESTER, Laurie S., LUTHER,<br />
Susan M., and LUISI, Theresa A., Dietary Effects on<br />
Test Performance. ..,.. .,,,....,. . . . . . . . . . . . . . . . . . . . . . 280<br />
424. MAEL, F. A., What Makes Biodata Biodata?............. 286<br />
425. VAN HEMEL, S., ALLEY, F., BAKER, H., and SWIRSKI, L.,<br />
Job Sample Test for Navy Fire Controlman,............ 292<br />
426. BAKER, H., SANDS, M., and SPOKANE, A., ASVIP: An<br />
Interest Inventory Using Combined Armed Services<br />
Jobs................................................, 298<br />
427. SPIER, M., DHAMMANUNGUNE, S., BAKER, H., and SWIRSKI,<br />
L * , Predicting Performance with Biodata.............. 304<br />
428. ALBERT, W. G. and PHALEN, W. J., Development of<br />
Equations for Predicting <strong>Testing</strong> Importance of<br />
Tasks.......................................,....,... 310<br />
429. DITTMAR, Martin J., HAND, Darryl K., PHALEN. William<br />
J and ALBERT, W. G., Estimating <strong>Testing</strong> Importance<br />
oftTasks by Direct Task Factor Weighting..... . . . . . . . . 316<br />
430. Not Presented.<br />
431. BRADY, Elizabeth J. and RUMSEY, Michael G., Upper<br />
Body Strength and Performance in Army Enlisted MOS... 322<br />
432. PALMER, D. R., WHITE, L. A., and YOUNG, M, C.,<br />
Response Distortion on the Adaptability Screening<br />
Profile (ASP)..,....,............,...........,..,.... 328<br />
433. BANDERET, L. E., SHUKITT-HALE, B. L., LIEBERMAN, H.<br />
R SIMPSON, LTC R. L., and PEREZ, CPT P. J..<br />
Psychometric Properties of a Number Comparison Task:<br />
Medium and Format Effects.........................,.. 334<br />
Xi
434.<br />
435.<br />
436.<br />
437.<br />
438.<br />
439,<br />
440.<br />
441.<br />
442.<br />
443.<br />
444.<br />
445.<br />
446.<br />
BANDERET, L. E., O'MARA, M., PIMENTAL, N. A., RILEY,<br />
SGT R. H., DAUPHINEE, SSG D. T., WITT, SSG C. E., and<br />
TOYOTA, SGT R. M., Subjective States Questionnaire:<br />
Perceived Well-Being and Functional Capacity.........<br />
ROMAGLIA, CIC Diane L. and SKINNER, Jacobina,<br />
Validity of Grade Point Average: Does the College<br />
Make a Difference?...................,...,.,,,,,.,,,.<br />
Not Presented.<br />
HANSEN, H. D., Flight Psychological Selection<br />
System - FPS-80: A New Approach to the Selection<br />
of Aircrew Personnel,.,.......,.,,......... . . . . . . . . . .<br />
MELTER, A. H. and MENTGES, W., Leadership in Aptitude<br />
Tests and in Real-Life Situations.................. . .<br />
PUTZ-OSTERLOH, W., Computer-based Assessment of<br />
Strategies in Dynamic Decision Making........... . . . . .<br />
RODEL, G., The "Information and Counseling Action"<br />
(IBA) of the German Navy......~............,.........<br />
CONNER, Dr. Harry B., Troubleshooting Assessment and<br />
Enhancement (TAE) Program: Test and Evaluation<br />
Results.......................,.........,.............<br />
BUSCIGLIO, Henry H., Incrementing ASVAB Validity with<br />
Spatial and Perceptual-Psychomotor Tests......... . . . .<br />
RUSHANO, T. M., Item Content Validity: Its<br />
Relationship with Item Discrimination and<br />
Difficulty...............,,..,..,......,.....,.......<br />
FIEDLER, E., The Air Force Medical Evaluation Test,<br />
Basic <strong>Military</strong> Training, and Character of<br />
Separation......,,........ . . . . . . . . . . . . . . . . . . . . . . . . . . . 392<br />
TRENT, T., QUENETTE, M. A., and LAMBS, G. J.,<br />
Implementation of the Adaptability Screening Profile<br />
(ASP)..........,.........,..........,.....,..,..,..,.<br />
MCGEE, Steve D., Utilization of Word<br />
Processors/Computers vs Typewriter for U.S. Navy<br />
Typing Performance Tests.....................~.~.....<br />
PAPER PRESENTATIONS - HUMAN FACTORS<br />
501. THARION, W. J., MARLOWE, B. E., KITTREDGE, R., HOYT,<br />
R and CYMERMAN, A. Acute High Altitude Exposure<br />
and Exercise Decrease Marksmanship Accuracy.. . . . . . . . . 408<br />
X-ii<br />
339<br />
345<br />
351<br />
357<br />
362<br />
368<br />
372<br />
380<br />
386<br />
398<br />
404<br />
_
502. Not Presented.<br />
503. COLLINS, Dennis D., Human Performance Data for Combat<br />
Models.........,......,.....,.............,.,,.,.,..,<br />
504. TLJRNAGE, Janet J., KENNEDY, Robert S., and JONES,<br />
Marshall B., Trading Off Performance, Training, and<br />
Equipment Factors to Achieve Similar Performance.....<br />
505. BAYES, Andrew H., Final Report, Computer Assisted<br />
Guidance Information Systems.. . . . . . . . . . . . . . . . . . . . . . . .<br />
PAPER PRESENTATIONS - LEADERSHIP<br />
601.<br />
602.<br />
603.<br />
604.<br />
605.<br />
606.<br />
607.<br />
608.<br />
609.<br />
610.<br />
Not Presented.<br />
ALDERKS, Cathie E,, Vertical Cohesion Patterns in<br />
Light Infantry Units....;......... . . . . . . . . . . . . . . . . . . .<br />
LINDSAY, Twila J. and SIEBOLD, Guy L., The Use of<br />
Incentives in Light Infantry Units... . . . . . . . . . . . . . . . .<br />
SIEBOLD, Guy L., Cohesion in Context., . . . . . . . . . . . . . . .<br />
WALDKOETTER, R. O., WHITE, W, R., Sr., and VANDIVIER,<br />
P. L., Evaluation of the Army's Finance Support<br />
Command Organizational Concept.......................<br />
STEINBERG, Alma G. and LEAMAN, Julia A., Leader<br />
Initiative: From Doctrine to Practice...............<br />
Not Presented.<br />
Not Presented.<br />
Not Presented.<br />
CLARK, Herbert J,, Starting a TQM Program in an R&D<br />
Organization.............,....,,.....................<br />
PAPER PRESENTATIONS - MISCELLANEOUS TOPICS<br />
701. Not Presented.<br />
702. ROOZENDAAL, Col. G. J. C., An Officer, a Social<br />
Scientist, (and possibly a gentleman) in the Royal<br />
Netherlands Army (RNLA)..... . . . . . . . . . . . . . . . . . . . . . . . . .<br />
703. GOLDBERG, Edith Lynne, SHEPOSH, John P., and<br />
SHETTEL-NEUBER, Joyce, Acceptance of Change: An<br />
Empirical Test of a Causal Model....... . . . . .,........<br />
Xiii<br />
414<br />
419<br />
425<br />
432<br />
438<br />
444<br />
450<br />
455<br />
460<br />
466<br />
474
PAPER PRESENTATIONS - SYMPOSIA (ALL CATEGORIES)<br />
801. TWEEDDALE, J. W., Symposium: The Naval Reserve<br />
Officers Training Corps (NROTC) Scholarship<br />
Selection System................,.............. . . . . . 480<br />
801A. TWEEDDALE, J. W., Research Needs for Naval Reserve<br />
Officers Training Corps Scholarship Selection... . . . . 480<br />
801B. HAWKINS, R. B., Gathering and Using Naval Reserve<br />
Officers Training Corps Scholarship Information,. . . . 481<br />
8OlC. EDWARDS, Jack E.9, BURCH, Regina L., and ABRAHAMS,<br />
Norman M., Validation of the Naval Reserve Officers<br />
Training Corps Quality Index..................,..... 486<br />
801D. BORMAN, Walter C., OWENS-KLJRTZ, C. K., and RUSSELL,<br />
T. L., Development and Implementation of a<br />
Structured Interview Program for NROTC Selection,... 492<br />
801E. HANSON, Mary Ann, PAULLIN, Cheryl, and BORMAN,<br />
Walter C., Development of an Experimental<br />
Biodata/Temperament Inventory for NROTC Selection.. . 498<br />
802. 802 through 8025 Not Presented.<br />
803. BORMAN, W., BOSSHARDT, M., DUBOIS, D., HOUSTON, J.,<br />
CRAWFORD, K., WISKOFF, M., ZIMMERMAN, R., and<br />
SHERMAN, F., Psychological Applications to Ensuring<br />
Personnel Security: A Symposium....,.... . . , ,..., . . . 504<br />
803A. DUBOIS, D., BOSSHARDT, M., and WISKOFF, M., The<br />
Investigative Interview: A Review of Practice and<br />
Research...................,...........,.,..,.......<br />
803B. ZIMMERMAN, R. A. and WISKOFF, M. F., Utility of a<br />
Screening Questionnaire for Sensitive <strong>Military</strong><br />
Occupations....... .,. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511<br />
803C. BOSSHARDT, M., DUBOIS, D., and CRAWFORD, K.,<br />
Continuing Assessment of Cleared Personnel In the<br />
<strong>Military</strong> Services.....................,.......,..,.. 516<br />
803D. HOUSTON, J., WISKOFF, M. and SHERMAN, F.,<br />
A Measure of Behavioral Reliability for Marine<br />
Security Guards.... . . . . . . . . . . . . . . . . . . .,......,.,.... 522<br />
804. HARRIS, J. H., CAMPBELL, Charlotte H., and CAMPBELL,<br />
Roy C., Symposium: Job Performance <strong>Testing</strong> for<br />
Enlisted Personnel..... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528<br />
xiv<br />
505
804A.<br />
804B.<br />
ao4c.<br />
804D.<br />
805.<br />
DOYLE, Earl L. and CAMPBELL, R. C., Navy: Hands-On<br />
and Knowledge Tests for the Navy Radioman........,.. 529<br />
EXNER, Maj P. J,, CRAFTS, J. L., FELKER, D. B.,<br />
BOWLER, E. C., and MAYBERRY, P. W., Interrater<br />
Reliability as an Indicator of HOPT Quality Control<br />
Effectlveness.............,...,....,,...,....,,,,,,, 535<br />
Not Presented.<br />
CAMPBELL, Charlotte H. and CAMPBELL, Roy C., Army:<br />
Job Performance Measures for Non-Commissioned<br />
Offlcers...,........,.~,.,.,.,,..................... 541<br />
BROOKS, J. T., CkLE, W. J., HARRIS, J. C., STANLEY<br />
II, P. P., and TARTELL, J, S., The USAF Occupational<br />
Measurement Squadron: Its Organization, Products,<br />
and Impact......,.,..,...........................,., 547<br />
PAPER PRESENTATIONS - VENDOR PRESENTATIONS<br />
901, BROWN, Gary C., The Examiner..,.......:............. 553<br />
CONFERENCE INFORMATION<br />
MINUTES OF THE STEERING COMMITTEE MEETING.. . . . . . . . . . ..a.... 560<br />
LIST OF STEERING COMMITTEE MEETING ATTENDEES ............... 562<br />
AGENCIES REPRESENTED BY MEMBERSHIP ON THE<br />
MTA STEERING COMMITTEE................................. 563<br />
BY-LAWS OF THE MILITARY TESTING ASSOCIATION ................ 568<br />
LIST OF CONFERENCE REGISTRANTS ............................. 573<br />
INDEX OF AUTHORS.....,............,........................ 585<br />
XV<br />
_.
32nd Annual Conference of the <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong><br />
Orange Beaah, Alabama<br />
5 November 1990<br />
,OPENING SESSION<br />
Opening Remarks: Commander Mary A. Adams, Head, Naval<br />
Advancement Center Department, Naval Education and Training<br />
Program Management Support Activity; Pensacola, Florida. . . .<br />
Welaome: Mr. George W. Tate, Executive Vice President, Orange<br />
Beach Chamber of Commerce; Orange Beach, Alabama<br />
*<br />
Keynote Addrees: Lt General Donald W. Jones, Deputy Assistant<br />
Secretary of Defense (<strong>Military</strong> Manpower and Personnel Policy)<br />
xvi
'I'li~; CANADIAN RESERVES: CURRENT X1JD FUTURE MANPOWER*<br />
Susan R. Truscott<br />
Directorate of Social and Economic Analysis<br />
Operational Research and Analysis Establishment<br />
Department of National Defence<br />
Ottawa, Canada<br />
BACKGROUND<br />
In 1587, the Canadian White Paper on Defence outlined<br />
numerous policy changes for the Canadian Forces. One of these<br />
was the Total Force Concept. In brief, it stated that the<br />
distinction between the regular force and the reserves is to be<br />
reduced and the responsibility for national defence is to be<br />
shared. To fulfil its commitments, Canada must look to a<br />
peacetime structure that can be rapidly and effectively<br />
augmented by a trained reserve force composed of part-time<br />
members. A mixed operational force is to be formed, where<br />
regular and reserve force personnel are integrated in units.<br />
The ratio of full-time to part-time personnel will be dependent<br />
on the nature and requirements of the unit.<br />
Currently, regular force members outnumber reservists by a<br />
ratio of more than three to one. To assume a greater role in<br />
the defence of Canada, the reserves are to be revitalized and<br />
expanded. The recruitment of a large number of reservists, and<br />
perhaps different types of reservists, over the next decade<br />
will present a challenge to the reserves, in light of current<br />
socio-demographic and economic trends such as a declining youth<br />
population and broader employment opportunities. Recruiting the<br />
required number of reservists may necessitate new initiatives -<br />
for example, the widening of the traditional recruiting<br />
population and the engagement of new recruiting and advertising<br />
strategies.<br />
Several studies have been undertaken to provide data on a<br />
force that has, at least from a research point of view, been<br />
largely ignored in recent years. The focus of this paper is on<br />
a three phase study of the Primary Reserves, conducted by the<br />
Directorate of Social and Economic Analysis. During Phase One,<br />
qualitative information was collected through interviews with<br />
key reserve personnel. In Phase TWO, a survey was administered<br />
to a random sample of reservists to identify the<br />
characteristics, attitudes and values of reservists. The study<br />
also focused on retention and the internal organization of the<br />
reserves. A national attitude survey of 6000 Canadians was<br />
conducted, in Phase Three, to assess knowledge of the reserves,<br />
attitudes toward the reserves and the propensity of Canadians<br />
to join the reserves. Preliminary results of this study, and<br />
' The views and opinions expressed in this paper are<br />
those of the author and not necessarily those of the<br />
Department of National Defence.<br />
1
their implications in light of socio-demographic trends in the<br />
Canadian population and organizational changes planned for the<br />
reserves, are highlighted in this paper. A profile of<br />
reservists is presented first. This is followed by data on the<br />
Canadian public's knowledge of, and attitudes toward, the<br />
reserves.<br />
FINDINGS<br />
A. SURVEY OF RESERVISTS<br />
The reserves are dominated by young, single males. At the<br />
time of the survey, thirty-one percent of the reservists were -.<br />
students and 18% were unemployed. Together, these two groups<br />
comprise almost one-half of the reserves. Of the remaining 51%<br />
who were employed, about 24% were Class B or C Reservists, and<br />
thus in continuous ful!l-time employment with the military. In<br />
comparison to 1976, there has been only a modest change in the<br />
percentage who are employed. However, there has been a<br />
substantial increase in the percentage who are unemployed and a<br />
decrease in the percentage who are students. This is an<br />
indication of how closely tied reserve recruitment and<br />
retention is to the employment situation in the Canadian<br />
economy, and in particular the regional economy - relationships<br />
well documented in regular force research.<br />
The reserves are attracting and/or retaining more personnel<br />
who have or are achieving post-secondary education. Of the<br />
reservists who were attending school in 1976, 66% were in high<br />
school, 17% were in college and 16% were enroled in university.<br />
The recent survey indicated that 50% of the students were in<br />
high school, 20% were in college and 30% were in university.<br />
The increase in reservists with, or attaining post-secondary<br />
education reflects the greater emphasis on education in<br />
society, the greater technical demands in some areas of the<br />
forces, and the use of reserve activity to subsidize<br />
post-secondary education costs.<br />
Many reservists have prior experience with the military -<br />
forty percent had been members of the cadets and 20% had<br />
previously been in another reserve unit. Ten percent of primary<br />
reservists had served in the 'regular force. Ex-service members<br />
provide expertise and training that is difficult; if not<br />
impossible, to recruit from the civilian work force. There are<br />
some 65,000 ex-regular force members who would be suitable for,<br />
but are not members of the reserves (Bossenmaier, 1987).<br />
Our study indicated that word of mouth was the most common<br />
first source of informatiori on the reserves. Only 7% of<br />
reservists reported that formal advertising had provided their<br />
first. information on the reserves. National advertising<br />
campaigns have not been the focus of reserve recruiting in the<br />
past, however, they are an effective means of directing a<br />
specific message to a target population. Indeed they may<br />
provide a very functional mechanism to enhance public awareness<br />
2
of ‘;llC 1; e f-;
proportion of the population benefit from receiving some<br />
military training and experience and be of use to the military<br />
if mobilization is required. In addition, they contribute to<br />
the "Defence Community" in Canada; that is, sub-groups of<br />
Canadians with military knowledge and experience and an<br />
understanding of the Defence mandate. The reserves have both<br />
organizational and societal responsibilities, thus public<br />
relations campaigns, should be designed to appeal to those who<br />
view the reserves as a part-time job and to those who view it<br />
as a professional calling.<br />
Based on estimates from Statistics Canada, the general -.<br />
population will continue to age due to birth rates below<br />
replacement levels. It is expected, however, that the growth<br />
rate in the countqy will be maintained through increased<br />
immigration. These projections carry major implications for the<br />
reserves. A decline in the youth population means that the<br />
traditional recruiting base will shrink and that the reserves<br />
may increasingly have to rely on women, older persons, the<br />
employed, ex-regular force members and first generation<br />
Canadians to fill its ranks. These are all subgroups of the<br />
population currently under-represented in the reserves. By<br />
extending its age restrictions, older members in certain trades<br />
may be encouraged to stay.<br />
B. NATIONAL ATTITUDE SURVEY<br />
The National Attitude Survey was administered to 6000<br />
Canadians between the ages of 15 and 50 to assess the level of<br />
awareness of the reserves, attitudes toward the reserves, and<br />
the propensity of various sub-groups of Canadians to join the<br />
reserves. Eighty percent of those interviewed had heard of the<br />
reserves, but few admit to having a great deal of awareness of<br />
the reserves or their activities. In fact, just over 40% of<br />
those interviewed said they were not at all, or not very aware<br />
of the reserves, or their activities.<br />
Many Canadians reported that word of mouth was their most<br />
significant source of information on the reserves. Forty-five<br />
percent of Canadians reported that friends, family members,<br />
relatives or teachers were their main source for information<br />
about the reserves. The media were reported as the most<br />
significant sources for 42% of those interviewed; thus far more<br />
important than was the case for reserve members.<br />
Twenty percent of those interviewed, and 25% of those 15-<br />
24 years of age, had considered joining the reserves within the<br />
previous year. Addressing the future, about 5% of the sample<br />
said that they were somewhat to very likely to considering<br />
joining the reserves. This was the case for 10% of those aged<br />
15-24. Interest in joining the reserves was highest among 15-24<br />
year olds in the Atlantic provinces and Quebec. Those who<br />
indicated a willingness to join the reserves, most frequently<br />
responded with patriotic reasons. Monetary/work experience<br />
4
easons, followed by social reasons were a's0 common responses.<br />
Older persons were more likely to report patriotic reasons for<br />
interest in the reserves, while pragmatic reasons were more<br />
common among zhe young. A lack of interest, family, school and<br />
work responsibilities, and age were the most common reasons<br />
provided by those not interested in joining the reserves.<br />
SUMMARY<br />
In summary, the reserves are still very dependent on young<br />
Canadians to fill its ranks. While there are benefits in<br />
recruiting from this sub-group of the population, there are<br />
also considerable drawbacks. The historical exclusion of<br />
students from mobilization, high attrition rates, and the<br />
resulting continual training requirements are examples. With<br />
little doubt, attrition rates will continue to remain high<br />
among the reserves. Principally, the factors that draw<br />
reservists out of the reserves are related to their age and<br />
stage of life. This suggests that attrition rates may be<br />
improved by attracting a different type of individual to the<br />
reserves, such as older persons, ex-regular force members or<br />
the civilian employed. Further efforts will be made to explore<br />
attitudes toward the reserves, and the propensity to join,<br />
across sub-groups of the population currently under-represented<br />
in the reserves, as well as factors which may currently limit<br />
or restrict participation. Reserve manpower and related manning<br />
issues should be reviewed in light of the new policy role for<br />
the reserves.<br />
REFERENCES<br />
Bossenmaier, G. (1987). Potential Manpower Resources For<br />
Mobilization Part 1 (ORAE Project Report No. PR434).<br />
Ottawa, 0ntario:Directorate of Manpower Analysis.<br />
Goodfellow, T.H. (1976). Reserve Force Survey (ORAE Project<br />
Report No. PR62). Ottawa, Ontario: Directorate of Social<br />
and Economic Analysis.<br />
Popoff, T. and Truscott, S. (1987). A Sociological Study of<br />
the Reserves: Phase Two Trends and Implications for the<br />
Future (ORAE Project Report No. PR440). Ottawa, Ontario:<br />
Directorate of Social and Economic Analysis.<br />
Sinaiko, W. H. (1985). Part-Time Soldiers, Sailors and Airmen,<br />
Reserve Force Manpower in Australia, Canada, New Zealand,<br />
the U. K. and the U.S. (Technical Panel 3 Report (UTP-3)).<br />
Washington, DC: The * Technical Cooperation Program,<br />
Subgroup U.<br />
Truscott, S. (1987). A Socioloqical Study of the Reserves:<br />
Phase Two Summary of Research Findinqs (DSEA Staff Note<br />
No. 4/88). Ottawa, Ontario: Directorate of Social and<br />
Economic Analysis.<br />
5
Martell and LTC Dennis Winr?<br />
Department of the Army Headquarters,<br />
Office of the Deputy Chief of Staff for Personnel<br />
Pentagon, Arlington VA<br />
In order for the recruiting command (USAREC) to achieve its aggregate accession<br />
mission, there must also be specific MOS requirements to match the accession mission.<br />
The personnel command (PERSCOM) develops these MOS requirements. This paper<br />
will briefly describe the process and interactions among the various systems currently used<br />
to get the right number and”mix of soldiers to support the Army’s end strength<br />
requirements. This paper will define the challenges at the MOS and aggregate level of<br />
detail, exacerbated by the current changing environment, faced by the models and programs.<br />
timing (planning/forecasting), structure versus MOS program reductions, training capacity,<br />
and the effects of abrupt execution year accessions changes will be covered. A brief<br />
description of some specific accession policies will also be addressed. In addition, potential<br />
accommodations by the system to changing demands will be presented.<br />
MOS Level of Detail<br />
Under current procedures, all Army enlistees are assigned a job or a <strong>Military</strong><br />
Occupational Specialty (MOS) upon initially contracting. Thus, for Army Recruiting<br />
Command (USAREC) to achieve its aggregate accession mission, there must also be<br />
specific MOS requirements to match the accession mission. These MOS requirements by<br />
grade and quantity are developed centrally at the Army’s Personnel Command (PERSCOM)<br />
and in the aggregate are MOS programs.<br />
MOS requirements are identified using a planning model called MOSLS (MOS Level<br />
System). Inputs to MOSLS include: AAMMP (Active Army <strong>Military</strong> Manpower Program)<br />
developed from the ELIM-COMPLIP (Enlisted Loss Inventory-Computation of Manpower<br />
Using Linear Programming), projected authorization data from the PMAD (Personnel<br />
Management Authorization Document) or UAD (Updated Authorization Data), and<br />
inventory data from the EMF (Enlisted Master File).<br />
MOSLS then determihes the recommended MOS and grade mix for the MOS<br />
inventories, the gains to the MOS required to meet those inventories, and the training<br />
needed to support those gains. While some of the gains to the MOS will come through<br />
reenlistments and reclassifications, the majority will come through USAREC’s accession<br />
mission. These accessions are referred to as the MOS programs.<br />
I’apcr presented at the 32nd Annual Conference of the <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong>, Novcmbcr lWi),<br />
Training to support these programs is obtained through the Structure Manning Decision<br />
Review (SMDR) process. The SMDR is held annu.ally and allows each of the Army<br />
components (Active Army, Reserves, and National Guard) to express the training needed<br />
to support its MOS programs. These program requirements are evaluated against the<br />
training capacity in the Training and Doctrine Command (TRADOC) and, once approved,<br />
becomes TRADOC’s training mission. The approved training requirements are referred to<br />
as the Army Program for Individual Training (ARPRINT) and identifies for the individual<br />
TRADOC schools what their training mission is for the fiscal year.<br />
The TRADOC schools in turn develop individual class schedules to support their<br />
training mission. These class schedules are placed into the Army Training Requirements<br />
and Resources System (ATRRS) and ultimately into the automated accessioning system,<br />
REQUEST. The total of the “seats” in the classes for a particular MOS for the year is equal<br />
to the MOS programs developed through M0SL.S and approved in the SMDR. USAREC<br />
recruits against these classes and by filling the individual class “seats” also fills the annual<br />
MOS programs.<br />
The above process works well in a stable, predictable environment; however, as seen<br />
in recent years and especially now with the uncertainties of reducing the manpower in the<br />
Army or “downsizing”, the environment is anything but stable or certain. Discussed below<br />
are some of the problems encountered in managing accessions during these unique times.<br />
Timing. The SMDR works in the future. For example, the SMDR held in April and<br />
May 1990 built the FY93 training programs. Although FY92 was revalidated and FY93 was<br />
given a first look, the major work was on FY93 and it is that year’s training which will be<br />
approved in the ARPRINT in the summer of 1990. Projections in the best of circumstances<br />
are chancy; in a downsizing environment, the training that is “bought” and approved in<br />
FY90 may no longer reflect the requirements when FY93 finally arrives. Critical to the<br />
MOSLS process are the known and projected authorizations (PMAD) based on projected<br />
force structure. If the structure changes then the MOS requirements change, and thus the<br />
training requirements. While there are mechanisms to make adjustments to the training<br />
programs, because the SMDR is so closely tied to the budget and resourcing process,<br />
significant changes may not be satisfied in a timely manner.<br />
Structure Reductions. PERSCOM can adjust its MOS programs throughout the year<br />
to match the accession mission changes. Generally, these changes have been reductions in<br />
USAREC’s mission. While the Deputy Chief of Staff for Personnel (DCSPER) Accession<br />
Division can easily reduce the aggregate requirement, PERSCOM cannot reduce the<br />
supporting MOS programs without knowing what changes are being made in structure.<br />
Experience in the past and currently is that decisions on structure reductions lag behind<br />
decisions to reduce the accession missions. PERSCOM then is left with a couple of<br />
alternatives: make a best guess, in coordination with the Office of Deputy Chief of Staff<br />
for Operations (ODCSOPS), on what structure is coming out and adjust accordingly, [but<br />
the risk is that the guess will be wrong and irreversible decisions on MOS level accessions<br />
will have been made]; or leave the MOS programs untouched with the result being more<br />
availabIe program and training than there is accession mission to support. If the accession<br />
mission is 1,000 but there are 2,000 MOS program available, onIy 1,000 will be recruited.<br />
With that excess, we allow USAREC and the applicant to dictate what MOS programs are<br />
filled. The risk is that the wrong MOS programs will be filled.<br />
7
.<br />
Trainine Canacitv. The training that was approved in the SMDR was at the annual<br />
level. The individual school or installation must convert that requirement to class schedules<br />
and spread the requirement across the year. Generally, that spread will be made on a<br />
straight line ,consistent with the capacity in each course. For example, if the requirement<br />
for an MOS is 120 with a class optimum size of 20, the TRADOC school will likely<br />
schedule 6 classes conducted every other month. The concept is the same for basic training.<br />
While TRADOC does have some surge capacity, this straight line scheduling is a reflection<br />
of the fact that TRADOC is budgeted and manned on an annual basis. The physical plant<br />
(e.g. billets) and training equipment (e.g. simulators, tanks) may also dictate a straight line<br />
schedule with limited surge capability.<br />
TRADOC’s capability in recent years has been stretched to the limit. Faced with<br />
structure and budget cuts itself, TRADOC has recently indicated that it can no longer surge.<br />
They have requested HQDA’s support in effecting a more even flow into the training base<br />
with about 35% of the annual training capacity in the 4th quarter. While this should allow<br />
all three components (Active Army, Reserves, and National Guard) to still meet their<br />
missions while taking advantage of the prime summer recruiting months,the ability of the<br />
Army to slide the accession mission into the 4th Quarter to save <strong>Military</strong> Personnel Accoum<br />
(MPA) dollars will be restricted.<br />
Execution Year Changes. When the mission is slid to the 4th Quarter to save dollars.<br />
that shift can cause critical training seats to be missed which perhaps cannot be made up<br />
in the future, Also, because of the way MOS programs are counted, shifting the mission<br />
into August or September can take that mission “across the training year” line into the next<br />
training year. This is because MOS programs are based on the start date of the MOS<br />
producing course. Thus, an accession in an Advanced Individual Training (AIT) course for<br />
an MOS in August 1990, counts against the FY91 program because the MOS course will<br />
start in October, 8 weeks after the soldier accessed and entered the 8 week Basic Training<br />
(BT) course; for MOS with One Station Unit Training (OSUT), BT and AIT are merged<br />
so that when the soidier accesses and enters OSUT, he/she is starting their MOS producing<br />
course. The result is too low a mission to support the MOS programs in one year and<br />
excess mission for the available training in the subsequent “training program year”.<br />
PERSCOM must then take action to align the aggregate programs with the new mission by<br />
reducing the program level. These MOS program reductions can have the same adverse<br />
affect discussed above in Structure Reductions.<br />
Aggregate Level of Detail<br />
Input to the ELIM is made during the budget process and includes expected<br />
requirements for each of the months of the year/years being developed. Key considerations<br />
in this process are: recruiting capability, budget, and training capability.<br />
Recruiting capability is defined as what USAREC believes it can handle for each<br />
month, quarter, and annual mission. The aggregate numbers for each of these periods is<br />
developed from contract recruiting history and not necessarily accession history. The<br />
contract capability is developed by looking at what USAREC expects it can achieve in the<br />
VariOUS specific mission categories: combinations of gender, nonprior service/prior service,<br />
~l&$l SC11001 graduation status, and AFQT (the Armed Forces Qualification Test, part of the<br />
Armed Services Vocation Aptitude Battery or ASVAB). Contracting missions to the
USAREC commanders in the field takes a 6 month lead time and is, in the aggregate,<br />
influenced by the accession mission. However, the accession flow to the training base is<br />
controlled by USAREC headquarters and at the Department of the Army level. In other<br />
words, the recruiting commanders are not greatly influenced by gyrations in the monthly<br />
accession requirements.<br />
The influence of the market, incentives, recruiter tools, economic environment but<br />
especially recent recruiting history drive the aggregate numbers. USAREC deals with<br />
Reception Station Months (RSM) vice Calendar months. The RSM allows flexibility for<br />
shipping recruits to the 8 (7 after closing of FT. Bliss Reception Battalion) reception<br />
battalions. Conversion of RSM to Calendar numbers is not an accurate science since it is<br />
based on projected rate of shipping during each RSW (week). The conversion is important<br />
to identify the costs of each calendar month which is determined by manyear costs prorated<br />
by months of the year. Present dollar amount for one full year is $17,657. Thus if 5,000<br />
recruits are shipped (now referred to as accessions) in March, the sixth month of the fiscal<br />
year, this number equates to a cost of 5,000 x 6/12 or 2,500 X $17,657 and is $44,142,500<br />
in MPA cost. On the whole the conversion has been relatively accurate.<br />
Budgetary requirements to save <strong>Military</strong> Personnel Account (MPA) dollars tend to<br />
influence greatly the enlisted accession requirements. For instance, large amounts of MPA<br />
dollars can be saved simply by shifting accessions from the beginning of a year to the end<br />
of the year. Movement of accessions to the 4th quarter has been done for the last few<br />
years and is already included in the FY91 accession requirements by calendar month.<br />
Ironically, this shift to the 4th quarter does not cause USAREC as much concern as it does<br />
TRADOC. The major cuts to the Army budget in the form of dollars and endstrength have<br />
a significant impact on the adjustments to the accession programs for the current and future<br />
years.<br />
USAREC preference is for the following quarterly RSM breakdown:<br />
Quarter Percentage<br />
FIRST 21-23<br />
SECOND 22<br />
THIRD 19<br />
FOURTH 37<br />
First and fourth quarters are the best when considering the market. Some dynamics to<br />
consider here, which are hidden in the numbers, include the fact that high quality soldiers<br />
are easier to enlist with shorter Terms of Service (TOS), the size of next year’s mission<br />
influences recruit entry into the DEP (Delayed Entry Program), and the higher the<br />
aggregate quality goals the tougher the recruiting (unless resources for recruiting are<br />
commensurately increased).<br />
TRADOC prefers an even flow into training at the rate of 8.3% per month of input<br />
from USAREC. They have stated that the surge of training requirements in the fourth<br />
quarters are near impossible to handle. The contention is based on 4th quarter surge which<br />
includes Active, Reserve and National Guard. Acknowledging that even-flow is impossible,<br />
the fourth quarter surges, closing on 40%, are not resource supportable, especially with the<br />
first three quarters under capacity. It should be noted that the Army is looking at feasibility<br />
of clustering or combining certain MOS, eliminating MOS, consolidating training,<br />
eliminating BT/AIT/Reception Battalions (e.g., FT. Bliss) and this, although long term, will<br />
9
have a positive effect on some of the problems mentioned.<br />
Specific Accessions<br />
Females. The female enlisted accession floor was initially set in FY88. The most<br />
recent floor is based on slight growth, but growth nevertheless, of 0.2% to 0.5% percent<br />
each year in the female enlisted end strength compared to the total enlisted endstrength.<br />
This allows female endstrength to be reduced as the Arpy downsizes but, perhaps, at a<br />
lower rate than males. Final content for FY96 will be 13.3% of the enlisted endstrength.<br />
Recruiting females is actually more difficult than recruiting males. There are over 240,000<br />
available MOS slots for females in which to enlist. However, females have a lower<br />
propensity to enlist (11% versus 17% for males, YATS89) and gravitate to the more<br />
attractive MOS such as medical (91), administrative (71), supply (76), and communication<br />
(31)-usually the top four MOS for females annually, In FY89 63% of the females were in<br />
AFQT CAT I-IIIA, all but a hatidful were high school graduates, 69% took the four-year<br />
terms, 54% were white, 40.5% were black. The female accession floor of 15,500 was<br />
exceeded by 4%.<br />
Establishing a gender neutral accession mission in the Army will likely lower, not raise,<br />
the number of females who enlist. Money for college and shorter terms of service are the<br />
two most important considerations for females and without exceptional resources and<br />
attention to these attractions, and without proper “gosling” of the recruiter, female<br />
accessions would probably be significantly lower, perhaps by as much as 60%. Contract<br />
goals for females in FY88 was 13.4% with 13.7% achieved; FY89 had 16.3% as a goal and<br />
18.1% of total contracts achieved.<br />
Prior Service(PS). With the changes in Army structure there may be more need for PS<br />
to fill resulting holes in the structure, more as a result of unexpected losses of personnel<br />
than from structured reductions in MOS. However, the present PS requirement of 3,000<br />
in FY91, and 2,000 for FY92 appears intact. USAREC and the CMF 18 initiative to<br />
identify specific requirements for special forces NCOs to reenlist is an example of what<br />
should be developed for MOS fill. Essentially, almost half (42% in FY 87 and 49% in<br />
FY88) required retraining. All PS are AFQT CAT I-IIIA and the majority (90%) take the<br />
four year term, 10.5% were females for FY89, 71.6% were white, and 23.9% were black for<br />
FY89.<br />
Ouality. Relevant, empirical research has clearly shown the need for quality (high<br />
school graduates scoring in the top fiftieth percentile on AFQT) in the Army. Quality is<br />
a valid predictor of persistence or likelihood to finish one’s term of service and of ability<br />
to train in the first term. As the body of research on the performance of high quality<br />
soldiers (Army Soldier Performance Research Project (SPRP) etc.) is disseminated, there<br />
will be greater understanding of the value of quality soldiers and the interest in job<br />
performance and aptitude and ability testing is likely to increase; second term and later<br />
performance has yet to be rigorously analyzed, Nevertheless, the amount of quality<br />
required is and will continue to be questioned by Army leadership, OSD, and Congress.<br />
Another issue is the logical leap required to go from individual soldier quality to unit<br />
performance and readiness, although the connections were articulated in 7th annual report<br />
to Congress on linking enlistment standards to job performance, the argument will be<br />
viewed skeptically for some time. The accession quality improvements to date must not be<br />
10
lost; but asking for more quality, i.e, 67% I-IIIA, 95% HSDG and 4% or less CAT IV,<br />
is stretching the previous well-founded research arguments to the breaking point. The<br />
marginal performance benefit is difficult to quantify, and the cost effectiveness of the<br />
increased quality may not stand up to any close scrutiny.<br />
The Future<br />
Since the endstrength reductions mandated by Congress for the 1991-95 downsizing<br />
precede USAREC structure cuts, USAREC is now recruiting fewer individuals with the<br />
same number of recruiters that had been dictated by higher annual qualitative and<br />
quantitative objectives. The decrease in recruiting difficulty is therefore only temporary;<br />
the, need for models to predict accession requirements based on endstrength goals is critical.<br />
Several accession and force structure models are in various stages of completion.<br />
ALENO (Alternate Enlistment Options) which is being developed by the Concepts Analysis<br />
Agency has the potential to provide future skill level one and two structure requirements<br />
to the MOS level of detail with the input of such variables as term of service, quality and<br />
accessions. ALENO also will translate endstrength requirements into accession inputs by<br />
quality and term of service. In addition, SRA Corporation has developed a prototype of<br />
its Army Force Structure Planning Model (AFSPM) which is aimed at determining<br />
accession requirements in the future considering quality and term of service inputs as well<br />
as retention/attrition rates. The ALENO and AFSPM models should be available within<br />
the next six months. Other models, perhaps less sophisticated, are being developed to<br />
answer the key concerns about accession missions for the future.<br />
In a rather straightforward manner the steady state accession mission can be determined<br />
by past accession ratios. Considering the variables of term of service mix, gender, quality<br />
and its collateral attrition/retention rates and accepting the fact that the new endstrengths<br />
place manpower management in completely foreign territory, the past ratio of enlisted<br />
accessions to endstrength has remained relatively static for many years. Applying ratios of<br />
mission to endstrength for the past six years to the 488,969 enlisted endstrength results in<br />
high and low estimates for the end state mission of 103,200 and 85,600 respectively. A<br />
reasonable estimate for the accession floor to support a 580,000 end state is therefore<br />
85,000. However, with higher quality projected in the outyears (less first term attrition)<br />
and lower average TOS mix, from the FY89 high of 3.88 years to the present average TOS<br />
for FY90 of 3.7 years (and dropping, mostly as the result of offering shorter terms to<br />
attract higher quality), the accession floor is expected to be closer to 90,000 by FY96. The<br />
relationships and effects of these variables to one another and to the accession mission is<br />
considerable. Establishing a Term of Service average objective for USAREC in the annual<br />
rnission letters could be used to better align the force for the future downsizing. Although<br />
the recruiting market dictates what can be sold in contracts, an overall TOS mix average<br />
set in aggregate from the MOS requirements/TOS mix would foster more control over the<br />
longevity (and experience) of the force.<br />
Overall, the systems for establishing and monitoring accessions is in place and has been<br />
effective. The accession objectives, although highly dynamic, can be achieved. The<br />
downsizing, however, leads the Army into completely unmapped terrain which will greatly<br />
test the systems, the personnel managers, and, most notably, the soldiers presently in the<br />
Army.<br />
11<br />
,
Reviewed by:<br />
D.S. Crooks<br />
Lieutenant-Commander<br />
Research Coordinator<br />
ETHNIC PARTICIPATION<br />
IN THE CANADIAN FORCES: DEMOGRAPHIC TRENDS<br />
Lieutenant (Naval) D.T. Reeves<br />
1<br />
Paper presented at the 32nd Annual Conference of the<br />
<strong>Military</strong> <strong>Testing</strong> <strong>Association</strong>, Orange Beach,<br />
Alabama, U.S.A., November 5-9, 1990.<br />
Canadian Forces Personnel Applied Research Unit<br />
Suite 600, 4900 Yonge Street<br />
Willowdale, Ontario<br />
M2N 6B7<br />
Approved by:<br />
F.P. Wilson<br />
Commander<br />
Commanding Officer
Background<br />
ETHNIC PARTKCIPATION<br />
IN THE THE CANADIAN FORCES: DEMOGRAPHIC TRENDS<br />
Lieutenant (Naval) D.T. Reeves<br />
Canadian Forces Personnel Applied Research Unit<br />
Willowdale, Ontario, Canada<br />
INTRODUCTION<br />
Given the current Human Rights climate and the dwindling lal-mlr<br />
force, the perceived lack of ethnic minority representation in the<br />
Canadian Forces (CF) is of concern. Socio-demographic trends portend a<br />
Canadian population marked by cultural diversity ancl an aging, dwindling<br />
Iabaur force. In keeping with the multicultural policy OF the Canadian<br />
government, and in response to the proposed expansion of the Primary<br />
Reserves, the CF is reviewing its representation of ethnic minorities.<br />
Currently there is a dearth of most ethnic minorities in the CF<br />
compared to their representation in the general population (Febhraro G<br />
Reeves, 1990). This under-representation is of concern to National<br />
Defence in its efforts to ensure that the cultural diversity of the<br />
Canadian population is reflected in the composition of the CF.<br />
Purpose<br />
The purpose of this paper is to review the present ethnic<br />
composition of both the Canadian population and the CF, and provide a<br />
preliminary examination of immigrant and visible minority recruitment in<br />
one l.arge urban area.<br />
Definitions<br />
CANADIAN ETHNIC DIVERSITY<br />
In order to ensure concept clarity, the definitions of ethnic,<br />
immigrant and visible minority are AS follows (Multiculturalism and<br />
Citizenship Canada, 1990):<br />
R . Ethnic. The culture or country of origin of an individllal nr<br />
one's ancestors.<br />
b. Immigrant. Anyone who is not a Canadian citizen by birth.<br />
P-. Visible Minorities. Genera3ly, persons other than Aboriginal<br />
peoples, who are non-Caucasian in race or non-white in colour.<br />
There are in excess oE 100 ethnic groups in Canada and, exclud:ng<br />
persons of English and French origins, this represents 25% of the Canadian<br />
population (75% of the Canadian population are of English, French or<br />
multiple English and French origins). Ethnic group concentratinns ‘JR i--g<br />
from p-rovince to province and from city to city and many ethnic groups<br />
13
s record populations fewer than 10,000 members, with a considerable number<br />
Of grcJUps registering 3 , 0 0 0 Or less. In terms of large groups, there are<br />
only 11 which register more than 250,000 members excluding persons of<br />
British or French origins (Multiculturalism and Citizenship Canada, 1990).<br />
Canada's ethnic composition has changed substantially since the end<br />
of World War II. Duri.ng the earliest period of Canadian immigration<br />
history, immigrants arrived largely from Britain and France. After 1945,<br />
they came increasingly from other countries in Western and Eastern Europe<br />
and from the United States. More recently, immigrants to Canada have come<br />
primarily from Asia, Africa, the Caribbean, and Central and South America<br />
- although between 1973 and 1980, Europe was still the single largest<br />
source of immigrants. Immigration levels have fluctuated between 30,000<br />
in 1945 to a peak of 282,000 in 1957. Current immigration projections for<br />
the 1990s are between 150,000 and 175,000 per year.<br />
The composition of ethnic populations varies from province to<br />
province. While people with British origins make up the largest<br />
proportion of the population in all provinces except Quebec, the size of<br />
this proportion varies from 90% in Newfoundland to 30% in Manitoba and<br />
Saskatchewan. Persons horn outside of Canada currently comprise a larger<br />
part of the Canadian population than at any other time. This foreign born<br />
m-w, the majority (80%) of whom are Canadian citizens, now represents<br />
approximately 15% of the Canadian population. Most immigrants (53%) live<br />
in three cities: Toronto (32%), Montreal (12%) and Vancouver (lo%),<br />
although the specific ethnic mixes for each of these cities is different<br />
(Multiculturalism and Citizenship Canada, 1990).<br />
The main source of information about ethnic populations has heen<br />
the Canadian Census. In the past, census data have used such narrow<br />
indicators of ethnic origin as language spoken at home, mother tongue,<br />
Paternal ancestry, and country of origin. The 1986 Statistics Canada<br />
definition of ethnicity is based upon ethnic origin as it refers to one's<br />
cultural ancestral roots, and may therefore reflect ancestry, nationality,<br />
race, language or religion, hut should not he confused with citizenship or<br />
nationality in the strictest sense (Statistics Canada, 1988). In 1986,<br />
for the first time, the census recorded both single and multiple ethnic<br />
origins in order to establish a more accurate picture of the ethnic<br />
make-up of Canada's population. As a result, a substantial proportion<br />
(282) of Canadians jndicated multiple origins in their ancestry. Given<br />
the multiplicity and changing nature of the definition of ethnicity and<br />
the increasing numbers of multiple origin members, caution should be<br />
exercised not to use Canadian ethnic origins in any ahsolute way-<br />
Canada's largest ethnic origin groups are; 8.4 million British<br />
only, 6.1 million exclusively French origins, and 1.2 million both British<br />
and French. Almost 9.4 million Canadians indicated at least one ethnic<br />
origin other than British or French, and more than 6 million Canadians<br />
reported having non-British and non-French ethnic roots (Multiculturalism<br />
and Citizenship Canada, 1990). Table 1 is based on 1986 census data and<br />
shows the ten largest ethnic groups in Canada. The CF is dominated by<br />
members of British and French origin. The under-representation of most<br />
other ethnic groups is apparent by examining Table 2 census data<br />
14
Tahte 1<br />
The Ten Largest Ethnic Groups in Canada (1986 Census)<br />
1.<br />
2.<br />
3.<br />
4.<br />
5.<br />
6.<br />
2<br />
9.<br />
IO.<br />
Single Multiple<br />
Origins Origins Total %" %'l<br />
English 4,742,040 4,562,910 9,303,950<br />
French 6,087,310 2,027,945 8,115,255<br />
Scottish 865,450 3,052,605 3,918,055<br />
Irish 699,685 2,922,605 3,622,290<br />
German 896,715 1,570,340 2,467,055<br />
Italian 709,590 297,325 1,006,915<br />
Ukrainian .420,210 541,100 961,310<br />
Dutch (Netherlands) 351,760 530,170 881,930<br />
Polish 222,260 389,845 611,745<br />
North American<br />
Indian 286,230 262,730 548,960<br />
36.8 la.7<br />
32.1 24.1<br />
15.5 3.4<br />
14.3 2.8<br />
9.7 3.5<br />
3.9 2.8<br />
3.8 1.7<br />
3.5 1.4<br />
2 .4 0.9<br />
2.2 1. . 1<br />
Note. In all calculations, the figure used to represent the total Canadian<br />
population is 25,309,331.<br />
aInrlicating single and multiple origins (based upon total response data).<br />
'Percent of Canadian populatinn indicating a single origin.<br />
Tahle 2<br />
Representation of Selected Ethnic Groups in the CF (1986 Census)<br />
Ethnic Origin<br />
% of Canadian % of CF Officer % of CF Non-Officer<br />
Population Population Population<br />
British 25.0 29.0 29.3<br />
French 24.1 19.2 26.8<br />
German 3.5 2.9 2.3<br />
Italian 2.8 .4<br />
Ukrainian 1 . 7 1.3 :iZ<br />
Dutch 1.4 1.6 1.0<br />
Chinese 1.4 .3 .I<br />
South Asian .9 .2 .I<br />
Rlack" 1.0 .- 7 .2<br />
Ahoriginalsb 2.8 1.4 2.6<br />
Visible Minoritiesb 6.4 1.8 1.8<br />
aRlack represention estimates based upon personal communication with L.'iz!<br />
Director of Personnel Informations Systems, 1990. bBased upon Empl~~vm~nt<br />
and Immigration statistics (1989).<br />
15
(Statistics Canada, 1988). Almost all of these figures are helow their<br />
corresponding statistics in the general Canadian population with the<br />
Italian, Chinese, South Asian and Black groups being the most underrepresented.<br />
IMMIGRANT ANT) VISIDLE MINORITY RECRUITMENT<br />
Non-Commissioned Members - Regular Force<br />
A recent review of regular force non-commissioned (NCM) recruit<br />
applications (conducted at Canadian Forces Recruiting Centre (cFRC)<br />
Toronto in August 1990) indicated that 91.4% of applicants were Canadian<br />
born. Foreign born (immigrant) applications were only 8.6% of the total<br />
of those applying, and well below their Census Metropolitan Area Toronto<br />
representation level of 36% of the population. A further breakdown of<br />
foreign horns revealed thdt 55% of this applicant group (or 4.7% of the<br />
total applicant population) were mcmhers of a visible minority. This<br />
compares with a national visible minority representation of approximately<br />
6% and a Census Metropolitan Area Toronto representation of 13% (visible<br />
minority status of the Canadian born group could not he 'established using<br />
file information). Typically, the average number of regular force NCM<br />
applicants who go on to become enrolled is approximately 30%. In this<br />
most recent revi.ew, however, only 4% of visible minority applicants and 8%<br />
of foreign horn applicants were enrolled. Although these figures are<br />
hased on active files, and therefore more will prohably enrot before the<br />
end of 1.990, they will. still. remain below the foreign horn and visible<br />
minority representation levels for Census Metropolitan Area Toronto and<br />
the nation as a whole. Out of a total of 51 foreign horn regular force<br />
NCM applicants, there were four enrolments; and while 28 of these foreign<br />
horn applicants were visihle minorities, only one was enrolled.<br />
Officers - Regular Porte<br />
Regular force officer applicants were 80.1% Canadian horn and 19.9%<br />
foreign horn with 34.3% (or 6.8% of the total population) of the foreign<br />
horn group being visible minorities. As was noted with the NCM regular<br />
force applicants, both the foreign born and visihle minoritry groups were<br />
well below their respective representative figures for Census Metropolitan<br />
Area Toronto. The pattern of low enrolment seen with regular force NCM<br />
candidates, however, was ameliorated for regular force officer applicants,<br />
of which 17.1% foreign borna, 20.8% visible minorities, and 16.7% Canadian<br />
horns, enrolled. Out of 70 foreign horn regular force officer applicants,<br />
there were 12 enrolments; and while 24 of these Eoreign horn applicants<br />
were visihle minorities, five were enrolled.<br />
Non-Commissioned Memhers - Reserve Force<br />
In contrast to the regular force NCM recruiting, a review of NCM<br />
reserve force files indicated a much more positive picture, with Canadian<br />
horn applicants representing 56X, foreign horns 44% (8% ahove Census<br />
Metropolitan Area statistics), and visible minorites 68.8% (or 30.1% of<br />
the total applicant population). NCM enrolment percentages for Canadian<br />
horns (45.9%) were highest, with enrolments of 33.8% and 32.1% for foreign<br />
horns and the visible minorities, respectively. It was also noteworthy<br />
that 12.6% of reserve force NCM applicants (22.7% of whom were enrollerl)<br />
16
or 25.6% of foreign horns, :Jero actually non-Canadinns, i.r?., new arrivals<br />
til C.anada . Cut nf 154 foreign horn reserve Eorce NCM applicants, there<br />
were 52 enrolmcnts; and while 196 of these foreign horn i1ppJicant.s were<br />
vi.sihle minorities, 34 were enrnlle%l.<br />
Officers - Reserve Force<br />
Reserve force officer applications roughly par.aJlcl that nf reo11?;1r<br />
0<br />
force officer applications wi.th 76.8% heing Canadian horn, 23.2% Fi)l-;~i~ll<br />
horn and 38.5% members of a visihle minority (or 3.9% of the tot:+J<br />
applicant population). Enrolments for the reserve force officers wer('<br />
similar for both Canadian horns (37.2%) and foreign horns (38.5%), and<br />
much lnigher for visible minority members (60%). Out of 13 foreign horn<br />
reserve force officer applicants, five were enrolled; and while five of<br />
these foreign horn applicants were visi:>le minorities, three were enrnJ.lerl.<br />
nISCUSSION<br />
T!w present review indicates that the CF regular Eorce does not<br />
reflect the Canadian cultllral mosaic. Amongst the under-represented<br />
groups, census figures indicate that Italian, Chinese, Rlack and Sollth<br />
Asian origins tend to he the lowest. This under-representation, combined<br />
with substantial Canadian populations, make these groups of ~peel~1'1<br />
interest for more focussed research. In terms of recruiting initiatives,<br />
all four groups represent a nota3lo, and as yet untapped, snurce 0 f<br />
personnel (Chinese, Black an!1 South Asians make up the largest and f.astest<br />
growing visible minority groups in Cana:i.a).<br />
Although immigrant recruitment at CFRC Toronto does not necessarily<br />
reflect national recruiting norms, tliis preliminary review suggests th.s!:<br />
the NCM reguJar force does not attract immigrants and visi5lc minority<br />
members at a representative rate. mis situation is somewhat improved for<br />
regular force and reserve force officers, and although they remain significantly<br />
below those levels required for representativeness in Censlls<br />
Metropolitan Area Toronto, they are ahove both the 3.986 census and nation;>!<br />
representation levels for CF visible rn1norit.v representation. In contrast,<br />
findings for the reserve force NCM applicant group in Toronto suggest that<br />
this group is actually ever-represented hy both immigrants and v-lsiblll<br />
minority members.<br />
The will.ingneas of individuals from immigrant groups tn apply for<br />
NCSI reserve force service in relatively large numhers, wllile at the ww<br />
time, avoiding regular force application, suggests that the attitrldes held<br />
by these groups regarding regular force employment may he suhstantinlly<br />
different. Since immigrant and viaiSle minority members have nnt shown an<br />
antipathy to apply for military duty per se, it is important to determine<br />
the specific attitudes which are held by these groups which may be acting<br />
as barriers to regular force enrolment. Knowledge gained ahout t':.lcsk><br />
groups in terms of distinct ethnic attitudes toward the CF may he used t:><br />
mndify the recrlliting approach to other under-represented groups for w'-~i(*l?<br />
study may he problematic (smaller numhers and wider geographic<br />
dispersion). These findings will have important cnnsequences for futilr~~~<br />
effective ethnic recruiting initiatives.<br />
17
. REFERENCES<br />
Employment and Immigration Canada. (1.989). Employment equity availahility<br />
data report on designated groups. Technical Services Employment<br />
Equity Branch. Ottawa: Minister of Supply and Services.<br />
Febbraro, A., & Reeves D.T. (1990). A literature review OF ethnic attftude<br />
formation: Implications for Canadian Forcqs recruitment (Working<br />
Paper 90-2). Willowdale, Ontario: Canadian Forces Personnel<br />
Applied Research Unit.<br />
Multiculturalism and Citizenship Canada. (1990). Multicultural Canada; 'A<br />
Graphic Overview. PO1 icy and Research Multiculturalism Sector<br />
Multiculturalism and Citizenship Canada. Ottawa: Minister of<br />
Supply and Services,<br />
Statistics Canada. (1988). Census handhook. Ottawa: Minister of Supply<br />
and Services.<br />
18
1990 ARMY CAREER SATISFACTION SURVEY<br />
Timothy W. Elig’<br />
U.S. Army Research Institute<br />
To help personnel officials prepare for the eventual downsizing of the Army, the Chief of Staff,<br />
Army (CSA) directed that a survey of soldiers be conducted rapidly. “The downsizing of the U.S. Army is<br />
inevitable,” BG Stroup wrote in a memorandum requesting the Army Research Institute (ARI) to conduct a<br />
survey “. . . to determine the attitudes and concerns of our soldiers about the changes that will take place.”<br />
Even as events in Southwest Asia and Operation Desert Shield have dominated the news headlines,<br />
other important events have continued. Discussions about federal budget deficits, the end of the Cold War<br />
era, increased cooperation between the U.S. and the U.S.S.R, and German reunification are also front page<br />
news that lead to speculation about a reduction in the size of U.S. military forces.<br />
Soldiers may feel that their careers are being victimized by their contributions to the successful<br />
conclusion of the Cold War even as they are asked to risk their lives for their country: Many of the soldiers’<br />
concerns about their career and prospects for downsizing may in fact be made worse by recent events that<br />
have fostered even more uncertainty and curtailed the flow of information on the future make-up of the<br />
Army. Thus it is important to understand the morale of the force as it was just prior to Operation Desert<br />
Shield, in order to understand how soldiers are likely to.respond to continuing career uncertainties.<br />
About this Survey<br />
The 1990 Army Career Satisfaction Survey (ACSS) was designed by AR1 to answer several questions<br />
raised by the CSA and by DA personnel policy makers and analysts. Administration costs were paid by<br />
HQDA though the Army Research Office’s Scientific Services Program.<br />
This survey was designed to provide an overview of soldiers’ attitudes, perceptions, and intentions<br />
concerning Army downsizing. While not all of these topics are discussed here, the survey included items on:<br />
career plans and intentions; advice to others on joining the Army; the Army experience as preparation for<br />
civilian jobs; organizational commitment and trust; reactions to European thaw in cold war and to<br />
downsizing; expectations about what a smaller Army would mean and what the Army would be like over the<br />
next live years; soldiers’ sources of information on downsizing and their trust in the sources; specific personal<br />
and family concerns about involuntary separation and resources needed to cope with unexpected separation;<br />
financial and emotional resources for separation; reactions to specific personnel management policies that<br />
could be implemented for downsizing; and propensity to accept “early-outs.”<br />
Thirty thousand soldiers (15,000 in enlisted, 10,000 in commissioned, and 5,000 in warrant ranks)<br />
were surveyed in June and July 1990. The main sample of 28,071 represents soldiers at all ranks countable<br />
toward the active strength of the Army on 31 March 1990, with the following exclusions: a) general officers,<br />
b) soldiers with less than 12 months of service, and c) soldiers in the process of separation or retirement.<br />
Another 1,929 soldiers who had been surveyed in previous efforts were also sent this survey in order to<br />
measure attitude changes over the last four years.<br />
Preliminary results from partial returns were provided to HQ, Department of the Army, in late July<br />
and early August. The final results presented here are based on 17,326 returned surveys from 6,997<br />
‘The findings in this report are not to be construed as an official Department of the Army position, unless SO designated<br />
by other authorized documents.<br />
19
- commissioned officers, 3,596 warrant officers, and 6,733 enlisted soldiers in the main sample. These data<br />
have been weighted to be representative of the Army.<br />
On the basis of both response rates and margins oi error, this survey provides accurate attitude<br />
estimates for the entire Army and for relatively small subgroups. Response rates for the survey were<br />
extremely good. Completed surveys were returned by 58% of the main sample. When adjusted for postal<br />
non-delivery and late returns of completed surveys the overall response rate is 65% (80% of warrant officers,<br />
76% of commissioned officers, and 51% of enlisted).<br />
The overall margin of error is less than 1.3% indicating that 95% of the time a sample estimate of<br />
50% is within 1.3% of how the entire population would respond if surveyed. Margins of error are also quite<br />
small for each of the three main groups (1.3 for commissioned officers, 1.7 for warrant officers, and 1.6 for<br />
enlisted) and for subgroups of soldiers defined by categories such as gender or rank.<br />
Soldiers Are Positive About Themselves, <strong>Military</strong> Service, and Their Skills<br />
Soldiers are positive about military service for themselves, There is a strong core of committed<br />
soldiers (57% of commissioned officers, 63% of warrant officers, 45% of enlisted) who want to serve for 20<br />
or more years even if they could retire earlier. For many of these soldiers, the kind of work they most enjoy<br />
is available only or primarily in the military. This is most strongly characteristic of commissioned officers.<br />
Soldiers are confident of their own job performance; over three quarters of them said they were well<br />
prepared or very well prepared to perform the tasks in their wartime jobs. Three-quarters of soldiers also<br />
rated their units as combat ready. Soldiers’ confidence in their job performance and military skills is also<br />
reflected in their evaluation of civilian-relevant skills. When asked if they agreed or disagreed with the<br />
statement “I have been taught valuable skills in the Army that I can use later in civilian jobs” 70% expressed<br />
agreement. Soldiers were even more positive about the effects of their Army experiences on skills and<br />
characteristics that would help them obfairz civilian jobs; 80% felt that the Army had a positive effect on<br />
specific job knowledge, skills, and abilities, while 86% felt that the Army had a positive effect on personal<br />
characteristics and attitudes. In another recent AR1 survey, Benedict (1990) found that even first-term<br />
soldiers recognized the value of their Army experience with 64% to 77% rating the Army as having a positive<br />
effect.<br />
Despite the unsettling times of the first half of 1990, soldiers were positive about recommending<br />
military service to others. When asked what they would tell a good friend who asked for advice on seeing a<br />
military recruiter, soldiers were ttearly eight tirves as likely to tell them that it was a good idea (46%) as to tell<br />
Iltetn that it was a waste of time (6%). The rest (47%) would tell their friend that it was up to him or her,<br />
apparently recognizing that military service is not for everyone. When asked specifically about enlistment in<br />
the Army, soldiers were twice as likely to recommend Army enlistment (60%) as enlistment in another<br />
service (27%). Only 13% would recommend not enlisting in any military service. Even on the very personal<br />
issue of their own children joining the military, soldiers were also fairly positive. Although less than onethird<br />
would like to see their daughter join the military, over two-thirds would like to see their son join the<br />
military at some point.<br />
Army Downsizing and Career Opportunities<br />
As we would expect, most ofticers (61% of commissioned and 55% of warrant) and many enlisted<br />
(40%) said that the chances of war with the Soviet Union were reduced by recent changes in East Germany,<br />
Hungary, Poland, and Czechoslovakia. However, there is still a perceived threat of war because of internal<br />
problems in the Soviet Union (economic problems, Lithuanian independence movement, ethnic unrest and<br />
20
clashes, etc.). It may be that the 61% of officers and 49% of enlisted who said yes to an increased chance of<br />
war with the Soviet Union were in fact just responding to this item as an increased chance of war, perhaps<br />
civil and not with the U.S.<br />
Although some soldiers said recent world events would probably affect what they do in the Army,<br />
the most likely impacts were seen in force size and promotion potential. As a result of recent world events<br />
48% said it was likely that demands on their time would increase. These soldiers could worry that mission<br />
statements will not be scaled back as resources and structure are cut, or that work details may replace<br />
training time for troops as many experienced during the draw-down of the early 1970’s. Further, two-thirds of<br />
soldiers (79% of commissioned officers, 69% of warrant officers, and 65% of enlisted) said it is likely that<br />
promotion oppomolities will decrease as a result of receilt world events. AS one officer commented, “How<br />
ironic that the very soldiers who brought about this peace dividend are the ones who have to suffer.”<br />
Reductions in force requirements have decreased soldiers’ confidence in their ability to be promoted<br />
and to have the opportunity to complete at least 20 years of Army service. Only 46% of commissioned<br />
officers, 59% of warrant officers, and 52% of enlisted were confident that as the Army becomes smaller they<br />
would be able to stay in the Army and be promoted on or ahead of schedule.<br />
Concerning the size of the future Army, soldiers were asked to predict the likelihood of several<br />
percentage reductions. Voting for as many size-cuts as they felt were likely. Three-fourths of soldiers<br />
believe today’s Army will be cut by up to 10%; over one-half believe the cuts will be about 20%; and nearly<br />
one-third believe the cuts will be at least 30%. Commissioned officers voted the reduction as likely to bc<br />
considerably larger than did the enIisted and the warrant officers.<br />
For the majority of soldiers, interest in<br />
serving in the Army is not strongly influenced by<br />
the size of force reductions that may be imposed.<br />
However, of the 40% of officers whose interest in<br />
serving is influenced by the size of the force, threefourths<br />
are less interested in serving in a pareddown<br />
Army and only one-fourth are more<br />
interested (Figure 1). This may well be related to<br />
fears about quality and career opportunities. Less<br />
than half of officer, warant and enlisted soldiers are<br />
conjidettt that the best officers, NCOs, and junior<br />
(skill level 1) enlisted will stay as the Army becomes<br />
smaller.<br />
While the question was not asked directly,<br />
it is possible that opportunities to exercise<br />
leadership may also be seen as decreasing in a<br />
smaller Army, especially by those less interested in<br />
serving in a smaller Army. It may be important to<br />
point out to those interested in developing their<br />
leadership skills that requirements for creative,<br />
effective leadership are likely to increase during a<br />
transition; and that opportunities to learn these<br />
difficult leadership skills will remain high even in a<br />
smaller Army. It is also likely that these skills will<br />
be in much greater demand in a civilian sector<br />
facing its own pressures for streamlining and<br />
efficiency.<br />
21
.<br />
Although less than 10% say they are leaving due to potential changes and cuts, many more arc<br />
concerned, and as many as 20% think it was a mistake to stay beyond their original obligation. Further 3O’;No<br />
say it’would take a lot to keep them beyond their current obligation. While only 12% have applied for a job<br />
in the last year, 41% have sought information about civilian jobs in case they leave the Army.<br />
One-fourth expect to be RIFed. Even<br />
more expect to be offered an early out (34% of<br />
commissioned officers and 44% of enlisted). See<br />
Figure 2. At least one-half were more concerned<br />
than a year ago about their long-term opportunities<br />
in the Army (62%), the kind of work they will go<br />
into when they leave the Army (56%), whether or<br />
not they would be able to quickly get a civilian job<br />
if needed (62%), and financial burden on self and<br />
family should they have to leave the Army<br />
unexpectedly (69%). Debt exceeds available<br />
savings for enlisted and warrant officers. Onefourth<br />
would also lose other family member income<br />
because of relocation if separated unexpectedly.<br />
Over three-fourths reported that it would be<br />
difficult or very difficult financially to be<br />
unemployed for two or three months.<br />
Soldiers are also pessimistic about what the<br />
future holds. Compared to how satislied they said<br />
they are today, fewer soldiers expect to be satisfied<br />
with the Army of 5 years from now in respect to<br />
job security (57% vs 38%), benctits (57% vs 43%),<br />
overall quality of life (49% vs 39%), and<br />
opportunities to do work liked (49% vs 41%).<br />
Officers also expect to be less satisfied with pay<br />
and allowances (55Y0 vs 44% for commissioned<br />
officers and 43% vs 38% for warrant officers). The same percentage (38%) of enlisted are satisfied with pay<br />
and allowances now as expect to be satisfied with pay and allowances five years from now. Beliefs about the<br />
future may determine interest in remaining in the Army. Expected satisfaction with future pay, benefits, job<br />
security, quality of life, and opportunities to do work one likes are each correlated with being more<br />
interested in serving in a smaller Army.<br />
Further, soldiers are even more likely to see the Army as suffering from a rapid draw-down than to<br />
see themselves as suffering. More soldiers agreed that the Army will cut strength so quickly that readiness<br />
(62%) and morale (68%) will suffer than agreed that they (36%) or their family (41%) will suffer.<br />
Information Flow<br />
Three-fourths of soldiers said they are not getting the right amount of information on future<br />
personnel reductions in the Army; in fact 15% of soldiers said they are getting no information. They tend 10<br />
credit the A~~J.J Times or other media with providing what information they do obtain. One soldier<br />
commented that “Our main source of information for issues on RIF, closure of bases, etc. is the mass<br />
media.” Only about one-half of soldiers think information on cuts in Army strength is reliable when obtained<br />
from the chain of command; one-third said they did not get information on cuts from the chain of command.<br />
Overall, 57% think information on the future of the Army that they receive from the Army itself (chain of<br />
command, post newspapers, etc) is accurate while 40% think it is timely. Roughly five percent of the<br />
22<br />
I
espondents were so concerned about this lack of information that they wrote comments about it on the<br />
questionnaire.<br />
Although 3% felt they were getting too much information on future personnel reductions, the<br />
overwhelming majority of soldiers want more information from the chain of command and asked for it in<br />
their comments: “Please keep us informed! Do not keep us in suspense ’ and “I think families of soldiers<br />
should have more information about their spouses’ careers and pay raises, early outs, pay cuts etc.”<br />
Of course, what they really want is for the dust to settle and for all the decisions to have been made.<br />
As one soldier put it: “I feel that the Army has hurt morale by coming out and saying the Army must<br />
decrease, way before it is time.” Another soldier expressed it in this way: “I feel that the military is moving<br />
kind of fast and who knows what the future holds.” Other comments reflect a perception by some that the<br />
cuts are being made already: “Forget the go slow method . . . Make the cuts/RIFs in one year and get’it ovci<br />
with . . . using promotion boards in lieu of RIFs is having a terrible effect on morale.”<br />
Concerns and Needs if<br />
Involuntarily Separated<br />
While most of the questionnaire<br />
dealt with the current attitudes of Army<br />
soldiers, it also contained questions on<br />
what soldiers’ concerns would be if<br />
involuntarily separated as well what help<br />
they would need in transitioning to a new<br />
career. Overall, if involuntarily separated,<br />
more than one-half of pcrsonncl would be<br />
very concerned or extremely concerned<br />
about separation pay (70%), health and<br />
dental care (63%), securing a job (61%),<br />
unemployment compensation (GO%), and<br />
health insurance (58%). Further, more<br />
than one-third were very or extremely<br />
concerned about advancing their education<br />
(48%), finding a place to live (46%), child<br />
care and schools (37%), and spouse<br />
employment (36%). Because some<br />
concerns may not be widespread, but may<br />
be vitally important to those who do have<br />
the concern, soldiers were also asked what<br />
were their three most important concerns.<br />
These most important concerns (adding<br />
together the three selections) are securing<br />
a job (over 80%), finding a place to live<br />
(450/o), separation pay (over 30%), and<br />
health and dental care (about 30%). ASO<br />
while only 37% and 36% of soldiers<br />
overall are very or extremely concerned<br />
about child care/schools and spouse employment, the percentages . jump . _-_.a to 57% r and ,. 50% I. respectively if we<br />
consider only those to whom these questions apply. And while only 28% or au ennsted are very or extremely<br />
concerned about enrollment in GI Bill by paying $1200, 46% of those who are not already eligible are very<br />
or extremely concerned about this. (Note that officers were not asked about Montgomery GI Bill benefits.)<br />
If they were to be involuntarily separated, soldiers saw a variety of job search tools as important<br />
23
including: labor market information and job banks, time-off (not charged to leave) for interviews and<br />
relocation planning, training and counseling. Specific needs as well as preferences for where services are<br />
provided will become part of the information base used in planning transition services.<br />
Personnel Policies and Other Issues<br />
A major section of each form of the questionnaire (enlisted, commissioned officer, and warrant<br />
officer) dealt with specific personnel policies and concerns. These issues are being examined by the<br />
appropriate divisions of the Office of the Deputy Chief of Staff for Personnel, with continuing support from<br />
ARI.<br />
Work is continuing on demographic differences and analyses of such issues soldiers’ career intentions<br />
and perceived vulnerability to involuntary separation. We are also examining the issue of where soldiers<br />
would move to if invoIuntarily separated. This affects how much the Army would have to pay for<br />
unemployment compensation and could affect recruiting markets as well. The data are also being made<br />
available to the Army War College and to Army oflicers at the Naval Postgraduate School for student<br />
research.<br />
Several of the survey questions were previously used in AR1 research efforts. Most importantly,<br />
many of the career intention and commitment items were contributed by ARl’s Longitudinal Research on<br />
Officer Career (LROC) project (Carney, In Preparation). AR13 Army Family Research Program (AFRP)<br />
contributed items on readiness, morale, and family situations (Bell, In Preparation). These research groups<br />
at AR1 are currently including these items in their analyses.<br />
BIBLIOGRAPHY<br />
Baker, T. (In Preparation). Potential GeodemokTaphic Effects of Army Force Reduction: mere Soldiers Pfan<br />
to Move ifseparated. Alexandria, VA: U.S. Army Research Institute.<br />
Bell, B. (In Preparation). The Amy Family Research Program (AFRP) Slmrvey, Alexandria, VA: U.S. Army<br />
Research Institute.<br />
Benedict, M. E. (1990). llle 1989 ARI Recruit Experience Trackirtg Survey: Descriptive Statistics of NPS<br />
(Active) Army Soldiers (AR1 Research Product 90-16). Alexandria, VA: U.S. Army Research Institute.<br />
Carney, C. (In Preparation). Longitudinal Research 011 Ofjicer Careers. Alexandria, VA: U.S. Army Research<br />
Institute.<br />
Elig, T. W., & Martell, K. A. (1990, October). 77le 199OAmly Career Satisfacfion Slrrvey (AR1 Special<br />
Report).‘Alexandria, VA: U.S. Army Research Institute.<br />
Elig, T. W. (In Preparation). 77le 1990 Army Career Satisfaction Survey: Descriptive Stutistics for<br />
Commissiorled OfJcer, Warrant Officer, and Enlisted Soldiers. Alexandria, VA: U.S. Army Research<br />
Institute.<br />
Elig, T. W. (In Preparation). The 1990 Amly Career Satisfaction Survey Technical Manual. Alexandria, VA:<br />
U.S. Army Research Institute.<br />
Elig, T. W., Benedict, M. E., & Gilroy, C. L. (1990, June). ARl 1990 Employer Survey Summary RepoJl (AR1<br />
Special Report). Alexandria, VA: U.S. Army Research Institute.<br />
Hay, M. S., Sr Middlestead, C. G. (In Preparation). Amtv Force Reductiorts, Soldiers’ Career htentions, anil<br />
Perceptions of Vltlterability. Alexandria, VA: U.S. Aimy Research Institute.<br />
24
The Use of Artificial Neural Networks<br />
in <strong>Military</strong> Manpower Modeling<br />
Jack R. Dempsey, D.A. Harris, and Brian K. Waters<br />
Human Resources Research Organization<br />
A new ides is delicate. It can be killed by a sneer or a yawn: it can be stabbed to death by a<br />
quip, and worried to death by a frown on the right man’s brow.<br />
-Charlie Brower--<br />
The military has been a trailblazer in the realm of manpower modeling and personnel measurement. According<br />
to an old saying, “Necessity is the mother of invention.” Well, due to the formidable recruiting and selection tasks<br />
facing the Services, pioneering efforts have been made and continue to push the military to or past the state of the<br />
art. There arc again innovative techniques which the Services are (or should be) considering to aid military selection<br />
strategies.<br />
<strong>Military</strong> selection policies are a topic of high level interest and scrutiny. Each of the Services sets standards<br />
for selection on the basis of citizenship, age, moral character, physical fitness. aptitude, and education credential.<br />
The latter two entry criteria are the ‘most visible screening mechanisms and the ones which the Department of<br />
Defense (DOD) uses to define and report recruit quality levels to Congress and other interested parties. Aptitude, as<br />
measured by composite scores from the Armed Services Vocational Aptitude Battery (ASVAB), is used to predict<br />
military technical school performance. Education credentials are used for adaptability screening. That is, they assess<br />
the likelihood of attrition, or positively, that a recruit will complete an obligated term of service. Both aptitude and<br />
education crcdcntial standards have been called into question of late by Congressional watchdogs. Actually, the<br />
flurry of interest in aptitude standards dates back to 1980 when Congress learned that between 1976 and 1980 the<br />
ASVAB norms were incorrect. This resulted in accepting hundreds of thousands of recruits who did not meet the<br />
intended minimum aptitude standards. Furthermore, Congress learned, much to its dismay, that enlistment standards<br />
were validated against training performance not actual job performance. Congress continues to inquire: What is the<br />
relationship between aptitude and job performance? And, how much quality is needed to ensure adequate job<br />
performance? A Herculean, on-going, multi-year job performance measurement (JPM) project has provided answers<br />
to the fist question while the answer to the second is in progress.<br />
More recently, education standards have come under attack by Congress and educational lobbying groups.<br />
Currently the plethora of credentials are categorized into one of three tiers based upon attrition rates. Each tier has<br />
differential aptitude standards and recruiting preferences. While education credential is the single best predictor of<br />
attrition, objections to this policy revolve around the fact that many individual members of the non-preferred tiers<br />
arc successful in service and are therefore wrongfully denied enlistment on the basis of group membership.<br />
The dual problems of linking quality requirements to job performance and implementing more cquiuble<br />
adaptability screening methods requires innovation. Classical statistical techniques may not provide the answer.<br />
These military selection questions require more sophisticated and less familiar modeling techniques. Just how do<br />
techniques such as neural networks complement the more common modeling procedures? The performance prediction<br />
and attrition screening applications described below provide at least a little food for thought and may suggest that<br />
a more in-depth look is required.<br />
Linking Standards to Job Performance<br />
This project’s purpose is to bring the Joint-Service Job Performance Measurement/Enlistment Standards (JPM)<br />
Project to fruition. This will be accomplished through four lines of endeavor. First, the military’s recruit selection<br />
measures (e.g., ASVAB) must be related to job performance in virtually all occupations. Second, a methodology<br />
must be developed so that empirical data can inform the setting of enlistment standards. That is, me expecled job<br />
performance of recruits, over their first term of enlistment, should match total job performance requirements. Third,<br />
improved. trade-off model(s) must be developed so that force quality requirements--based on empirically grounded<br />
job performance requirements--are considered along with related costs in the determination of enlistment standards.<br />
Finally, the Services’ personnel allocation systems must be made responsive to empirical information about the<br />
pcrformancc requirements of particular jobs.<br />
The data used for the Linkage Project consisted of 8,464 individual scrvicc mcmbcrs in 24 different occupations<br />
who had been administered hands-on performance tests as a part of the IPM Project. Each record contained ASVAB<br />
25<br />
.
subtests scores, time-in-Service, and education credential or diploma status.<br />
Job characteristics were obtained from an existing Department of Labor data base and represent an assortment<br />
of information about civilian jobs. The data base contained ratings on work complexity: training times; worker<br />
aptitude, temperament, and interest requirements; physical demands: and environmental conditions. Over 12,000 jobs<br />
were rated as part of a massive job analysis project culminating in the publication of the Dictionary of Occupational<br />
Titles (DOT) (U.S. Department of Labor, 1977).<br />
Direct ratings of the occupational characteristics of military jobs were not available, however, their ratings<br />
were estimated from matching equivalent civilian jobs. The results of the <strong>Military</strong> Occupational Crosscode Project<br />
(Lancaster, 1984; Wright. 1984) were used to determine military-civilian equivalence. Subsequent to ascribing<br />
civilian job characteristics, to the population of military jobs were factor analyzed. Initially, a five factor orthogonal<br />
rotation was adopted as the most appropriate, interpretable, and parsimonious solution. (These factors are referred<br />
to as PCl-5 in the network that follows).<br />
Regression Anproach<br />
Linking standards to job performance requires a performance prediction model. Using performance SCO~CS for<br />
24 jobs from the JPM project, individual characteristics, and AFQT scores, we examined a model in which each job<br />
was allowed to have its own intercept. This model was the baseline against which other models and techniques wcrc<br />
compared. This model gives the performance Pij of individual i in job j as:<br />
where:<br />
Pij = hands on performance test score,<br />
Tij = ASVAB technical composite score,<br />
Et, = education,<br />
X, = experience.<br />
Pij = aj + fij T, + rj E, + Sj X, + 4,<br />
Note that the subscript j for job on 4, fij, x and 8, implies that there is a different coefficient for each job, which<br />
arc treated for the moment as fixed.<br />
The coefficients in this model were estimated with ordinary least-squares by using a vector of dummy variables<br />
D for jobs and entering D, T, D x T, E, D x E, X, and D x X, The overall R* for this model is R2 = S96 with<br />
93 degrees of freedom for the model and 8370 residual degrees of freedom. Though a substantial amount of<br />
variance was accounted for, this model is not generalizable to jobs outside of the particular ones included in the<br />
model.<br />
To ensure generalizability, we examined a model in which job characteristics were used to predict various jobspecific<br />
effects. This is a fixed effects two-level model which uses job characteristics, expressed as a vector Mj to<br />
predict the job-specific intercepts and coefficients of the individual characteristics. The two-level form of the model<br />
was expressed as:<br />
where<br />
Pij = uj + pj Tij + 1: E, + 8, X, + &ij<br />
where n,, no, or, and n6 are row vectors of regression coefficients. The fixed effects model actually estimalcd<br />
Wit.%<br />
Ej = a + P Tij + y E, + 6 X, + A M, + B Mj Tij + r Mj E,j<br />
+ A M, X, + E, ,<br />
26
where:<br />
A = (A,, . . . . A,) , B = (B,, . . . . B,) , l- = (l-,, . . . . I?,) , and A = (A,, . . . . A)<br />
are vectors of regression coefficients. The RZ for this model is .350. This is considerably smaller than the R* z<br />
.59 achieved when intercepts were completely unconstrained. This suggests that the job characteristic factor scores<br />
explain a portion, but by no means all, of the variability in the job specilic intercepts.<br />
The Neural Network Approach<br />
Having witnessed the rather large degradation in variance explained between Model I and Model II, i.e.,<br />
R2=.595 to R*=.350, a neural network paradigm was investigated. Once the candidate explanatory variables were<br />
determined from the second model, the next step was to construct a neural network capable of analyzing the problem.<br />
Actual construction involved the following five steps which specified:<br />
o network type<br />
o number of nerodes in the output and hidden layers<br />
o training and cross validation gamples<br />
o transfer function at each layer and global error function<br />
o scaling, learning, momentum, epoch size parameters<br />
Network Architecture<br />
Since the problem involved a (hetero-associative) mapping of continuous, dichotomous, and polytomous<br />
explanatory variables to a bounded continuous criterion measure of hand-on-performance, a forward-feed backward<br />
error propogation network was chosen.<br />
Ostensibly, the single output “neurode” was hand-on-performance test score. Because the number of neurodes<br />
in the hidden-layer of the feedforward network determines the complexity of the function the network is capable of<br />
mapping, 26 was determined to yield a sufficiently complex network. Notably, it has been shown that any<br />
continuous function or ”.,.mapping can be approximately realized by Rumelhart-Hinton-Williams’ multilayer neural<br />
network with at least one hidden layer whose output functions are sigmoid functions (Funahashi, 1989; Homik,<br />
1989).”<br />
The data were randomly split 60140 into two sets. The first (N=5,078) was used to train the network and the<br />
second (N=3,386) was used to validate the network. The transfer function for the output neurode was logistic, while<br />
the transfer functions for the hidden neurodes were hyperbolic. Although any error function which is continuously<br />
differentiable could have been used, we selected the squared deviation between the observed and prcdictcd output<br />
values. Graphically, the network is shown in Figure 1 below.<br />
reject Llnkage Cumulntfuo Back-l’mpagatIpn fictuark OH Ilyperbolic Transfer<br />
z<br />
.,:~~,,;..I..~.j:.(-:.:..<br />
J;;,lx _<br />
.,.:~eLy;..;.‘?, ,: ! :‘, .; ‘.:.:.::?I;: ..:. .i-*a,. .:‘...i ‘!,..<br />
.~.:.~:,~:~.~.~~;.“: ;‘,. ; i
?he dara were scaled to network values between -0.85 and +0.85. Scaling ensured that the neurodes would not<br />
home saturated at the transfer function extremes. When this occurs learning ceases because the gradient of the<br />
error function approaches zero asmytotically. To guard against this, a nominal offset of 0.005 was added to each<br />
derivative. Finally, the leaming coefficient was initially set at 0.9 and gradually reduced as learning progressed.<br />
A momentum term of 0.5 was initially used and also gradually reduced. The learning rule chosen was the<br />
normalized cumulative delta rule’with and epoch size of five hundred.<br />
Results<br />
Once the network was trained on approximately one million random presentations of observations, the network<br />
was then evaluated against the cross-validation sample. The results are presented below.<br />
RZ R<br />
Model I .595 .77<br />
Model II .350 .59<br />
Neural Network ,574 .76<br />
:.<br />
A Chi Square Goodness-of-Fit test was then performed and the hypothesis that the predicted and observed came<br />
from diffcrcnt populations could be rejected at the .95 level of confidence. Notably, the neural network crossvalidated<br />
cocfficicnt practically matches the unvalidated coefficient for Model I.<br />
The results achieved using a neural network to predict job performance were far superior to regression based<br />
approaches when generalizability is considered. The above results provide an impetus to expand the investigation<br />
of neural networks to the Adaptability Screening Project.<br />
Adaptability Screening Project<br />
The Adaptability Screening Profile (ASP) project has been described in detail in previous work (Sellman, 1989;<br />
Trcnt, 1987). Succinctly, the purpose of the project is to: (1) develop a biographic instrument capable of assessing<br />
an individual’s propensity to adapt to military life; (2) determine its operational utility in predicting an individual’s<br />
likelihood of successful completion of an initial term of enlistment: and (3) utililize the instrument as part of an<br />
enlistment screening procedure, Because biodata instruments are assumed to be fakable and/or coachable, as part<br />
of any ultimate implementation, there must be a mechanism to detect and correct for response pattern distortion.<br />
Certainly, this is a difficult task. As reported by Walker (1989), the Army’s previous attempt at large-scale biodata<br />
implementation of the <strong>Military</strong> Applicant Profile (MAP) failed. The failure resulted from several factors, including<br />
lack of an on-going score monitoring system capable of detecting response pattern distortion. For whatever reason,<br />
the MAP validity for predicting attrition fell to zero, that is, it became useless for decision-making about individual<br />
applicants. To prevent a full-scale implementation of ASP from suffering a similar fate, NPRDC directed HumRRO<br />
to develop a score monitoring system. The purpose of the ASP score monitoring system is to: (1) deter faking and<br />
coaching; (2) detect response pattern distortion if and when it occurred; and (3) to estimate the effects of such<br />
distortion so that statistical adjustments could be made to counter the effects of the distortion. Because a more<br />
complete discussion of the score monitoring system is contained in Waters (1989), the following discussion will<br />
concentrate on attacking the distortion problem using neural networks.<br />
Armed Services Aoolicant Profile<br />
The ASAP data base consists of 120,175 applicants to the four Services. Administration occurred during the<br />
Lhrce month period commencing December 1985 and ending February 1986. Of the applicants, 55,675 were<br />
acccsscd. These records form a cohort file which will be appended with additional demographic data elements from<br />
the <strong>Military</strong> Enlistment Processing Reporting System (MEPRS) and the Defense Manpower Data Center Edited<br />
Enlisted Active Duty Master file. Each record will be updated with the inter-Service separation code (ISC) which<br />
will form the basis for 48-month criterion development. Faking/coaching will be simulated by intcnlionally<br />
distorting response patterns to varying degrees.<br />
Chsskal Approach to Response Pattern Disrtortion<br />
One approach to detecting response distortion is to develop a regrcsion based prcdiclion system which rclatcs<br />
background characteristics of applicants with point estimates of ASP score means, variances, skew and kurtosis<br />
indices. Demographic information on race, gender, education, home of record, age, number of dcpcndcnts and many<br />
other variables are available as predictors of ASP score. Accurate prediction would permit analysts of how well<br />
operational ASP data bchavcd as compared with “norming” group data, for the total group as well as subgroups.<br />
28
In attempting to relate ASP score to demographic characteristics, an ordinary least squares regression was run.<br />
The results yielded an ti =.213 and a root mean square error of 9.198. The large standard error is providing the<br />
motivation to determine whether these results can be improved upon using a neural network approach that attempts<br />
to map responses as opposed to total scores.<br />
Neural Network Approach to Distortion Detection<br />
The network paradigm that is currently being investigated is the cumulative backward error propagation<br />
network. The network has fifty outputs representing individual responses. The hidden layer includes 120 ncurodes<br />
and each uses a hyperbolic transfer function. The inputs include the same demographic information that was<br />
hypothesized to be related to the ASP score in the earlier regressions. The network is shown graphically in Figure<br />
2.<br />
:umulativs Backward Error-Prnpagatton Network<br />
Figure 2. Response Pattern Distortion Detection Network.<br />
Due to the number of calculations that are involved in training the above network to recognize response pattern<br />
distortion, a mainframe version of the cumulative backward error propagation neural network has been written for<br />
the IBM 4381 and implemented at the Navy Personnel Research Development Center (NPRDC). The current<br />
implementation is written in FORTRAN 77. Other network paradigms such as Grossberg’s Outstar , counter<br />
propagation, and others are in the process of being added. Although, results arc extremely encouraging, it is<br />
premature to report them at this time.<br />
Summary<br />
CertainIy neural network technology is still in its youth, nevertheless it has experienced significant growth in<br />
recent years and the momentum shows no signs of slowing. Initially, the technology had a “black box” image, but<br />
recent articles such as those by Ho&c and Funahashi demonstrate that neural networks is well founded in<br />
mathematical theory and has statistical roots. That is to say, that a simple ordinary lcast squares regression can bc<br />
expressed as a neural network, albeit a simple one. Neural networks have the potential for providing unique<br />
approaches and insights into, heretofore, intractable problems. In the context of military manpower research, the<br />
jury isn’t still deliberating, because all the evidence has not yet been presented. But when it is, we may find WC<br />
have new answers to old problems.<br />
29
REFERENCES<br />
Funahashi, K., (1989) On the Approximate Realization of Continuous Mappings by Neural Networks. Neural<br />
Networks. Z(3) 183-92.<br />
Homik, I
Hispanics in Navy’s Blue-Collar Civilian Workforce: A Pilot Study1<br />
Jack E. Edwards, Paul Rosenfeld, Patricia J. Thomas<br />
Navy Personnel Research and Development Center<br />
San Diego, CA<br />
The 1964 Civil Rights Act, Title VII mandated equal employment opportunity (EEO) for all persons rcgardlcss<br />
of race, color, creed, national origin, or gender. Congress amended the Civil Rights Act in 1972 to require most<br />
fcdcral agcncics to have programs that would help implement EEO policies. During the quarter of a century.since<br />
the passage of the Civil Rights Act, Blacks, as a group, have made significant inroads into both previously<br />
segregated organizations and segregated jobs within integrated organizations. Hispanics, however, have not been as<br />
successful in attaining employment opportunities.<br />
The Department of the Navy has been’unable to attract Hispanics in proportion to their representation in the<br />
U.S. labor force. In 1980, Hispanic representation in the civilian Navy work force was 3.2% compared to 6.4% in<br />
the total U.S. civilian labor force (CLF). Since 1980, the Navy’s civilian Hispanic rcprcscntation has incrcascd by<br />
only 0.3 percentage points to 3.5% while Hispanics in the CLF have increased 1.8 percentage points to 8.2%<br />
Moreover, the Navy’s 3.5% rate of Hispanic employment in civilian positions lags behind Hispanic representation<br />
rates of the Air Force (9.5%), Army (5.0%). and other federal agenicies (5.2%) (Secretary of the Navy,<br />
memorandum of 16 May 1989). Given projections that by the year 2000 Hispanics will constitute nearly 11% of the<br />
total U.S. population (Koretz, 1989), it is clear that the Navy needs to “intensify efforts to increase the number of<br />
Hispanics in the civilian work force” (Secretary of the Navy, memorandum of 16 May 1989).<br />
The underutilization of Hispanics, the projections of dramatic Hispanic population growth, and the potential<br />
benefits to the Navy of greater Hispanic rcprcsentation attest to the need for focused research on the Hispanic undcrrepresentation<br />
problem. An initial step toward the better utilization of this valuable human resource is to identify<br />
the barriers that have prevented Hispanics from obtaining parity in the work place. Toward this end, the Navy<br />
instituted a four-year EEO Enbancemcnt Research Project to increase Hispanics’ opportunities for employment<br />
parity. Previous project work has focused on the difficulties of accurately defining the Hispanic underrcprescntation<br />
problem (Edwards & Thomas, 1989; Thomas, 1987), a literature review on the relationships of attitudes and<br />
demographics to work outcomes (Edwards, 1988), and the geographic mobility of Hispanics for employment<br />
(Edwards, Thomas, Rosenfeld, & Bowers, 1989).<br />
Although Navy-related studies of Hispanics have been rare, one previous intensive research effort was<br />
concerned with the barriers faced by Hispanic Navy recruits (cf., Triandis, 1985). In a summary report of their<br />
Navy-funded studies, Triandis (1985) noted that he and his colleagues had found more similarities than diffcrenccs<br />
in comparisons among Hispanic, Black, and Anglo recruits. Triandis suggested that Hispanic Navy recruits of the<br />
early 1980s were not typical of Hispanics in the general population. In several reports, Triandis and colleagues<br />
argued that their research participants were so acculturated as to be indistinguishable from the mainstream of<br />
American culture. An‘important job-related component of acculturation is the ability to communicate in English.<br />
The National Commission on Employment Policy (1982) noted that poor English skills and lack of education arc<br />
two major reasons for Hispanic labor-market difficulties.<br />
Acculturation should be considered when determining whether Hispanic employees are different from their<br />
Anglo peers. Consideration of acculturation is also important in determining whether an organization is recruiting<br />
from the full Hispanic population or only from an acculturated portion as Triandis (1985) suggested. A need exists<br />
to determine whether there are differences among the Navy’s acculturated Hispanics, ~SS acculturated Hispanics,<br />
and Anglo majority group in its civilian workforce.<br />
‘The opinions expressed in this manuscript are those of the authors. They are not official and do not represent the<br />
views of the Navy Department. The authors gratefully acknowledge the assistance of Luis Joseph, Jerome Bower<br />
and Walt Peterson.<br />
31
.<br />
Met hod<br />
Samwle<br />
vrecruits. The sample was selected from newly hired men in semi-skilled or journey-person jobs as<br />
Department of Navy craftsmen, mechanics, operatives or service workers at 14 Navy activities in the contincnal<br />
United States. Each Hispanic male who entered one of the jobs was asked to voluntarily complete a questionnaire<br />
during his first week of work. A comparison Anglo male was also surveyed whenever his entry into a similar job at<br />
the same activity followed the entry of a surveyed Hispanic male.<br />
Resnondents. Six of the 160 completed questionnaires were discarded because the persons who idcntificd<br />
themselves as Hispanic indicated that either (a) his primary language was something other than English or Spanish<br />
or (b) his country of origin (e.g., Lebanon) was not such that findings from those individuals would generalize to<br />
persons from more commonly identified Hispanic lands. The surveys for three additional Hispanics could not be<br />
used because the participants did not supply responses to the acculturation index. As a result, 76 Hispanic and 75<br />
Anglo surveys were analyzed.<br />
Survey Instrument<br />
The questionnaire contained 111 items some of which were included as part of a longitudinal study. Results<br />
pertaining to only four of the categories: demographics, acculturation, need for clarity, and potential factors<br />
considered when taking a job, are reviewed in this paper. A pre-test of the survey determined that it could bc<br />
completed in less than 30 minutes. The average readability of the questionnaire was below the sixth grade rcading<br />
level.<br />
Acculturation. The four-item acculturation scale was pattcmcd after Kuvlcsky and Patella’s (1971) five-item,<br />
ethnic-identification scale. Respondents indicated how frequently they used a language other than English when<br />
they talked to family members, talked to friends, read a newspaper, or listened to a radio or TV. The anchors for the<br />
rating scale were never (I), almost never (2), sometimes (3), usuallv (4), and always (5).<br />
Need for clarity. Lyon’s (1971) four-item, need-for-clarity index asked respondents how important it was to<br />
know in detail: what is to be done, how the job is supposed to be done, the limits of the respondent’s authority, and<br />
how well the respondent is doing. Respondents completed the need-for-clarity items using the following rating<br />
format: not imwortant (l), neither unimoortant nor important (2), somewhat imoortant (3). important (4). and !&ty<br />
imnortant (5). Respondents were also given the option of indicating that an item was not true (0); such answers<br />
were treated as missing data.<br />
Potential factors considered when taking a job. Four types of factors were investigated: importance of jobrelated<br />
factors, work-group composition, sources of recruitment, and job-search activities.<br />
Procedure<br />
Definine Hiswanic acculturation Prouns. The Hispanic respondents were grouped into high a = 35) and low @<br />
= 41) acculturation groups based upon their responses to the four-item scale. For all analyses, respondents whose<br />
mean acculturation scores were 2.00 or less (i.e., the respondents who a or almost never used Spanish) were<br />
classified as high acculturation Hispanics (HAHs): the remainder of the Hispanic respondents were classified as low<br />
acculturation Hispanics (LAHs).<br />
Analvses. Whenever percentages are shown in a table, a chi-square test of independence was conducted to<br />
examine whether a relationship existed between group membership (Anglos, HAH, and LAH) and responses to an<br />
item or a composite. Whenever means are shown, a one-way analysis of variance (ANOVA) was perform@ with<br />
group membership as the independent variable and an item response or a composite as the dependent variable. A<br />
significant ANOVA result was followed by a Scheffe post hoc test to determine the source(s) of the difference. For<br />
all primary and secondary analyses, the probability level was set at .Ol. This significance level was chosen as a<br />
balance for three considerations: the exploratory nature of the research, the huge number of contrasts performed,<br />
and the already low statistical power caused by the sample sizes.<br />
Results and Discussion<br />
Dcmoeranhics<br />
In general, the Anglo and Hispanic groups were very similar (see Table 1). All three groups averaged about 34<br />
years of age, more than 12 years of education, and approximately 17 years of working for pay. Almost all of the<br />
respondents reported that they had been employed previously on a full-time basis and that they were not currently<br />
members of a union. The members of each group averaged similar amounts of time (between 4.50 and 6.75 years)<br />
in their last full-time job.<br />
32
34.81 33.60 34.00<br />
12.60 12.54 12.28<br />
17.92 16.69 17.16<br />
1.4% 2.9% 10.0%<br />
6.64 4.59 6.66<br />
40.0% 34.3% 48.8%<br />
9.1% 16.7% 16.7%<br />
20.0% 22.9% 37.5%<br />
Table 1<br />
Demographics<br />
4. Age (Mean number of years)<br />
5. What is the highest grade you completed in school or college? Count a<br />
GED as 12 years.<br />
6. Since you became 16, how many years have you worked for pay?<br />
56. Is this your first full-time job? (Answered “Yes”)<br />
If “No” how long were you employed full time in your last job? (years)<br />
10. Are you a veteran? (Answered “No”)<br />
11. Are you a member of a union? (Answered “Yes”)<br />
12. Have you worked for the Navy in some other civilian jobs? (Answcrcd<br />
” yes”)<br />
Two interesting but non-significant differences were observed. Compared to both Anglos and HAHs, a larger<br />
proportion of the LAHs reported having worked in other civilian Navy jobs. Second, 65.7% of the HAHs wcrc<br />
veterans. That proportion is higher than either the 60.0% for Anglos or the 5 1.2% for the LAHs.<br />
The overall similarity of the three groups with regard to demographics both clarifies and cautions the<br />
interpretation of subsequent findings. The similarity weakens any argument that demographic differences wcrc at<br />
least partially responsible for any subsequent difference among the groups. For example, the similarity with regard<br />
to veteran status lessens the possibility that the additional points awarded to veterans would differentially affect the<br />
time between application and employment for one or more groups. Still, caution must be cxcrcised in the<br />
interpretation of these and subsequent findings. One reason for caution is the atypicality of the Hispanics in this<br />
sample with regard to education. The Census Bureau (U. S. Department of Commerce, September 7, 1988) reported<br />
that 51% of all Hispanics aged 25 and above had completed high school and/or college during 1987 and 1988.<br />
Although this is an all-time high for Hispanics, it is still markedly lower than the 78% completion rate for non-<br />
Hispanics. Therefore, even though the three groups in this study are similar in terms of education, this study’s<br />
Hispanic sample is different from the Hispanic population. Second, conclusions are tenuous because of the small<br />
sample and low statistical power.<br />
Need for Clarity<br />
All three groups indicated a very high need for clarity, with LAHs reporting the highest need for clarity. The<br />
need-for-clarity scale mean for LAHs (4.72) was significantly higher than the mean for Anglos (4.33) and<br />
nonsignificantly higher than that of HAHs (4.49). The situation in the Hispanic population may be more extreme<br />
than implied by that small difference. The lower education level of the Hispanic population, in comparison to the<br />
sample participating in the present study, may result in yet more need for clarity by less-educated Hispanics.<br />
Gould (1982, p. 97) cited several studies that have shown that “Mexican-Americans do not tolerate ambiguity<br />
and uncertainty well”. The strong authoritarian role of fathers and emphases on sex roles and discipline in such<br />
families were suggested as possible reasons for Gould’s findings. The significant need-for-clarity difference found<br />
in this study also supports Ash, Levine, and Edgell’s (1979) finding that when given a chance to choose tasks,<br />
Hispanic (more so than Black or Anglo) job applicants disproportionately indicated a preference for jobs in which<br />
others would tell them what to do next.<br />
Potential Factors To Be Considered When Takine a Job<br />
Importance of iob-related factors. Table 2 shows the mean ratings for each group for each of the 10 factors. In<br />
addition to all three groups evaluating each factor at essentially the same level of importance, the average ratings for<br />
the factors showed the same pattcm across the three groups. The 10 Anglo means correlated .93 @ < .OOl) with the<br />
10 corresponding HAH means and .94 (p < .OOl) with the 10 LAH means. The HAH and LAH means correlated .84<br />
(r! < .OOl). The most important factor for Anglos and HAHs, and nearly the most important factor for LAHs, was<br />
the job security provided by the government. These findings show that all three groups valued the same rewards<br />
and outcomes and that the average value placed on any factor did not vary by group when ethnicity and<br />
acculturation were examined.<br />
33
.<br />
Anglo HAC &g g&g<br />
4.00 4.48 4.33 41.<br />
3.98 4.00 4.12 48.<br />
3.97 4.23 4.37 46.<br />
3.93 4.29 4.38 45.<br />
. 3.83 4.20 4.22 42.<br />
3.75 3.65 4.28 43.<br />
3.74 4.12 4.23 40.<br />
3.65 4.03 4.17 47.<br />
2.93 3.48 3.05 44.<br />
2.33 2.24 3.04 49.<br />
13.78 12.51 14.24<br />
4.64 3.15 4.08<br />
Table 2<br />
Potential Factors to Be Considered When Taking a Job<br />
Importance of Job-Related Factors<br />
Working for the government provides a lot of job security.<br />
I think the job will be interesting or challenging.<br />
The government provides EEO for promotions, training, etc.<br />
Benefits (time off, health ins., etc.) are good.<br />
The pay is good.<br />
The hours of my work schedule arc good.<br />
I badly need a job.<br />
I can learn a new skill.<br />
I don’t have to drive too far or can take a bus.<br />
I’have friends or relatives working here.<br />
Work-Group Size Preferences<br />
65. What size group would you like to work in? That is, how many pcoplc,<br />
counting yourself, would you like your boss to supervise?<br />
66. Imagine you were working with 10 other people everyday. How many of<br />
those people would you like to be of your race and ethnic group?<br />
Recruitment: How did you find out about this job?<br />
% Indicating Source@ (Place an “X” by as many answers as apply and write in the information<br />
asked.)<br />
48.6% 42.9% 56.1% 17. From a friend or relative<br />
21.6% 22.9% 12.2% 16. Federal job listing<br />
12.2% 11.4% 14.6% 15. Newspaper ad<br />
10.8% 11.4% 14.6% 22. Employment office or program<br />
10.8% 17.1% 12.2% 23. Other<br />
2.7% 0.0% 0.0% 21. School counselor or training program<br />
2.7% 0.0% 0.0% 19. I was a trainee or intern for this job.<br />
1.4% 0.0% 7.3% 18. From the union<br />
1.4% 2.9% 12.2% 20. EEO office<br />
.’ Job Search<br />
3.22 2.21 3.31 57. How many months passed between the final day of work on your last fuhtime<br />
job and your firs& day at work on this Navy job?<br />
5.02 3.60 4.23 58. How many months did it take from the time you filed your application for<br />
this job and your first day of work.<br />
2.44 4.00 3.89 59. How many times during the last 3 months did you check the Federal<br />
govcmmcnt job listings?<br />
1.35 1.45 I.97 60. During the last 12 months, how many Federal govcmment jobs did you<br />
apply for?<br />
4.47 3.26 4.02 61. During the last 12 months, how many other jobs did you apply for?<br />
Nole: @ The totals for rhe Recruitment columns we greater rhan 100% because respondenls could indicate more<br />
rhim one source.<br />
34<br />
.
Work-g;rouD comoosition. The average dcsircd number of persons sharing the respondent’s race/cthnicity was<br />
the same across the three groups (see Table 2). On average, Anglos dcsircd to work in groups that were 46.4%<br />
Anglos; HAHs, 31.5% Hispanics; and LAHs, 40.8% Hispanics.<br />
Given that less than 10% of the current U.S. population is Hispanic, the average desirable composition of the<br />
work groups for Hispanics may be unobtainable (even in locations such as those in this study that exceeded Ihc<br />
current national average). Furthermore, assigning a disproportionately high number of Hispanics to the same work<br />
group could result in segregated work groups and open an organization to discrimination complaints.<br />
Sources of recruitment. Nine chi-square tests of independence found no significant relationship bctwcen group<br />
membership and method of recruitment (see Table 2). Nearly half of all the respondents indicated that they found<br />
their jobs through a friend or relative. Because there are proportionally a great many more Anglos than members of<br />
other ethnic/racial groups working for the Navy and because the Navy already suffers from Hispanic<br />
underrepresentation, continued reliance on this recruitment method may perpetuate the current representation<br />
problems. Also noteworthy is the fact that so few persons were recruited by employment and EEO offices.<br />
Affirmative action recruitment apparently was not being done or at least was not being done effectively.<br />
Job search. Group means for the months spent getting the current job and the activeness with which the newly<br />
hired employees were previously pursuing kmployment opportunities are shown in Table 2. The short time bctwccn<br />
leaving a previous full-time job and obtaining employment with the Navy suggests that many of the newly hired<br />
employees from all three groups were working elsewhere until the time that they were hired by the Navy. For the<br />
olher non-significant difference for a time-related variable, both Hispanic groups were, on average, marginally<br />
faster than Anglos in obtaining their new jobs. Together, these time-based questions seem to indicate that Hispanics<br />
and Anglos are being treated equally during the hiring phase whenever they have similar job-related demographic<br />
chtiacteristics such as education and veteran’s preference.<br />
No ethnic or acculturation difference was detected for the three items measuring how actively the respondents<br />
were seeking their jobs. During the year prior to completion of the survey, the average number of jobs applied for<br />
was 6.00 or less for all three groups.<br />
Conclusions and Recommendations<br />
A goal of the present study was to identify factors among newly hired personnel that might help to explain tic<br />
reasons for Hispanic underrepresentation in the Navy’s blue-collar civilian work force. Overall, the results indicate<br />
Lhat both high- and low-acculturated Hispanics were more similar to Anglos than they were different. These<br />
similarities were obtained for both demographic variables and factors potentially influencing decisions to take a new<br />
postion. Echoing Triandis’ (1985) findings with Hispanic Navy recruits, the results of the present study indicate that<br />
the Navy is attracting Hispanics into its blue-collar workforce who are indistinguishable on a variety of dimensions<br />
from the majority (Anglo) group. As research on Hispanics in work settings continues to grow (e.g., Knouse,<br />
Rosenfeld, & Culbertson, in preparation), it will be of interest to see whether Hispanics entering other govemmenl<br />
and private-sector organizational settings are likewise similar to Anglos on key psychological and organizational<br />
dimensions. If indeed these Hispanics are, then organizations may need to refocus their efforts to attract those<br />
individuals whose characteristics are more reflective of the Hispanic population rather than a subgroup who arc<br />
indistinguishable from Anglos.<br />
This investigation did, however, reveal one organi;lational practice (recruitment) and one individual-diffcrcncc<br />
variable (need for clarity) that could be contributing to the lack of parity for Hispanics. The following interventions<br />
are suggested for dealing with those issues.<br />
1, Usemore An investment<br />
in formal recruitment (e.g., advertisements and job fairs designed especially for Hispanic communities) could ease<br />
future recruitment costs as Hispanic numbers continue to increase. If no change in recruitment procedure occurs,<br />
these findings suggest that the Navy will continue to experience non-parity for Hispanics. The Office of Pcrsonncl<br />
Management’s recently formed “Project Partnership”, an alliance with the Hispanic <strong>Association</strong> of Colleges and<br />
Universities and National Image, Inc., may prove useful as a means of increasing the number of Hispanics<br />
recruited(Weeklv Federal Emnlovees News Digest, March 19,199O).<br />
2.Enhance<br />
clarity. The Navy already has the required vehicle for implementing such training in the form of supervisory EEO<br />
training sessions. Supervisors could be presented with (a) methods for structuring tasks and duties and (b) the<br />
35<br />
.
.<br />
processes used in mentoring. While these intcrvcntions may be specifically designed to aid less acculturated<br />
Hispanics, they also can help employees from other ethnic and racial groups.<br />
References<br />
Ash, R. A., Levine, E. L., & Edgell, S. L. (1979). Exploratory study of a matching approach to personnel sclcction:<br />
The impact of ethnicity. Journal of ADDlied Psvchology, @, 354 I.<br />
Bumam, M.A., Telles, C.A., Kamo, M., Hough R.L., & Escobar, J.I. (1987). Measurement of acculturation in a<br />
community population of Mexican Americans. Hisoanic Journal of Behavioral Sciences, 9, 105-130.<br />
Edwards, J. E. (1988). Work outcomes as nredicted by attitudes and demogTaohics of ‘Hisoanics and nonHisDanics:<br />
A literature review (NPRDC Tech. Note 88-23). San Diego, CA: Navy Personnel Research and Devclopmcnt<br />
Center.<br />
Edwards, J. E., & Thomas, P. J. (1989>6 Hispanics: When has equal employment been achieved? Pcrsonncl<br />
Journal. 68, 144, 147-149.<br />
Edwards, J. E., Thomas, P. J., Rosenfeld, P., & Bower, J. L. (1989, August). Movine for emnlovmcnt: Are<br />
Hispanics less geogranhicnllv mobile than AnPlos and Blacks.3 Paper presented at the meeting of the Academy<br />
of Management, Washington, DC.<br />
Gould, S. (1982). Correlates of career progression among Mexican-American college graduates. Journal of<br />
Vocational Behavior, 3.93-l 10.<br />
Knouse. S.B., Rosenfeld, P., & Culbcrtson, A. (Eds.). (in preparation). Hispanics and work. Ncwbury Park, CA:<br />
Sage.<br />
Koretz, G. (1989, February 20). How the Hispanic population boom will hit the work force Business Week, 21.<br />
Kuvlesky, W. P., & Patella, V. M. (1971). Degree of cthnicity and aspirations for upward social mobility among<br />
Mexican American youth. Journal of Vocational Behavior, 1,231-244.<br />
Lyons, T. F. (1971). Role clarity, need for clarity, satisfaction, tension, and withdrawal. Organizational Behavior<br />
and Human Pcrformancc, 4,99-l 10.<br />
Marin, G., Sabogal, F., Marin, B. V., Gtero-Sabogal, R., & Perez-Stable, E. J. (1987). Development of a short<br />
acculturation scale for Hispanics. Hisnanic Journal of Behavioral Sciences, &183-205.<br />
National Commission on Employment Policy. (1982). Hisnanics and iobs: Barriers to nrotzrams. Washington, DC:<br />
Author.<br />
Rojas, L. A. 1982. Salient mainstream and Hisnanic values in a Navy training environment: An anlhroDolocical<br />
descrintion (Tech. Rep. No. ONR-22). Champaign, IL: University of Illinois, Department of Psychology.<br />
Secretary of the Navy (1989, May 16). Memorandum on HisDanic EIIIDlOYment.<br />
Thomas, P. J. (1987). Hisoanic underreoresentation in the Navy’s civilian work force: Definine the Droblcm (Tech.<br />
Note No. TN 87-31). San Diego, CA: Navy Personnel Research and Development Center.<br />
Triandis, H. C. (1985). An examination of Hispanic and peneral nonulation DerCCDtiOnS of OrmkatiOnai<br />
environments: Final reoort to the Office of Naval Research. Champaign, IL: University of Illinois,<br />
Department of Psychology.<br />
U. S. Department of Commerce, Bureau of the Census. (1988, September 7). Hisoanic educational attainment<br />
highest ever, Census Bureau Reoorts. Press Release from United States Department of Commerce News,<br />
Bureau of the Census.<br />
U. S. Department of Commerce, Bureau of the Census. (1985). Persons of Soanish Origin in the United States:<br />
March 1985 (Advance Report). Washington, DC: U. S. Government Printing Oflice.<br />
Weekly Fe&al EmDlOvecs News Digest. (1990, March 19). p. 4.<br />
36<br />
.
1 . INTRODKTICi?<br />
DESCRIPTORS OF JOB SPECIALIZATION<br />
BASED ON JOB KNOWLEDGE TESTS<br />
by<br />
C. Lee Walker, Omnibus Technical Services<br />
Jeffery A. Cantor, Lehman College, CUNY<br />
This study was un&rtaken to detxmine if job knowledge t2s.t and training<br />
history data could be usad to define a billet sutstructurs riflectirq specialization<br />
on certain equipments for which a Navy Enlisted Classification<br />
(NEC) was responsible. It was hypothesized that such specialization, if<br />
existing, I:ould be recognized by score patterns in System Achievement<br />
TCX6 (SATs) and by related patterns in the use of advanced training.<br />
Specinlizat.ion, thus id5ntif ied , could be confirmed by limit.& fleet sur-<br />
veying. The investigation produced data which suggested specialization but<br />
more particularly it produced tours e use patterns and course/SAT score r2lationships<br />
which provide insight into the way ships use advancad training<br />
to support readiness. This paper presents information on the methodology<br />
thus derived with the hope that it will provide a point of departure for<br />
other parsons faced with devaloping training analysis methodologies.<br />
2. APPROACH<br />
The Poseidon Fire Control Technician (NEC 3303) was chosen as the subject<br />
for the investigation &cause with six members per crew it provided an adequate<br />
population for developing a methodology that could then be applied<br />
to to larger populations. lWo basic methods of investigation were used:<br />
(1) Reviews of Personnel and Training program Evaluation Program(PT2P)<br />
Personnal Data ,Qstern (PDS) data and (2) discussions with training petty<br />
officers. The PDS review data was used as a guide in the discussions With<br />
fleet personnel.<br />
37
2 .1 PERSONNEL DATA SYSTEN INFORMATION. Scores, course attendance, and<br />
duty stations were extracted from the PDJc system for all FTB 3303 pereonnel<br />
.<br />
2.1.1 Studs PopulatiorA. The study hypclthesis required that the records of<br />
personnel be reviewed for events or changes occurring over the course.of a<br />
person’ 6 career in order to determine at what point in service spccializa-<br />
tion on an equipment or gqoups of equipments took plac?. For the study,<br />
this speciaiization or sub structure was of interest for personnel in sub-<br />
marine crsws . These comprise the bulk of the NEC. because of th* continu-<br />
ing evolution of equipment, training and measurement, some time limit<br />
needed to applied to the data used so that analysis results would be rel-<br />
evant . tiuilding on these general requirements three criteria were cstab-<br />
1iShtd for selection of records for analysis. For each record s~lcct& ti<br />
person h&d t0 have :<br />
a . Graduated from “0” school after the proscribed start date.<br />
b. Reported to the crew of an operating SSbN directly after “C’ school.<br />
c . Taken four or mora SATs .<br />
Review of data extracted indicated that few personnel in paygrade E-4 had<br />
sufficient SAT records to be useful for analysis. Personnel in pbygrtide E-<br />
6 had duty station histories which made a “cause-effect ” analysis of<br />
school racords not very uscf ul . It. was tl-ieref ore determined to use a sur-<br />
vey population of persons serving in paygrade E-5. Ninety-six persons in<br />
paygradc E-5 met the criteria and all were used in the study. preliminary<br />
analysis was done by randomly dividing the the population into two equal<br />
groups to look for consistency of results. Thi study methGdology was also<br />
applied to 16 persons in paygrade E-6 meeting similar critsria to ensure<br />
thbt no discontinuities were introduced by limiting the population to piygrade<br />
E-5.<br />
38
c:- TORc OF JOB SPEr:aON RAm ON JOS KNOWLEIXE TESTING<br />
IL crder LO analyze for the point in a persons career at which respcneibility<br />
for or expertise on a particular equipment was achieved, events of<br />
interest were assigned to relative Time Groups. The Time Groups were based<br />
on patrol Gycl es after graduation f ram Vu School. Schools or SAT scores<br />
were recorded for each individual in the patrol cycle sequence to which<br />
they belonged rather than being assigned on a calendar year basis.<br />
Since some records reflected more patrol cycles than others, the part of<br />
the survey population still present diminished in the later Time Groups.<br />
Table 1, shows the Time Groups used and the records with informatiax, for<br />
each Time Group.<br />
Table 1. Timi: Grout, Pooulation<br />
- - --<br />
Time GL-GUP 1 2 3 4 5 6 7 5<br />
PoJuiat ion 96 96 56 96 67 45 I 29 _ -- 11 l<br />
The events, 1, e, , sco’res, schools, were then summarized based on the Time<br />
Group into which they fell.<br />
2.1.3. ,sE?‘J’ St:ore Q.-J- lty.qiq . The SATs us& for this aIia&SiS were administxed<br />
upon completion of “CJt Schoo14’ and during each SSBP? “Off Crew” period.<br />
the tests are broken into several equipment dependent areas. Test<br />
versions are changed every five months with occasional changes in the number<br />
and size of areas between test versions. The score analysis was bses<br />
on test areas with scores recorded in their appropriate Time Groups. Since<br />
individuals entered the system at different times, results from several<br />
test versions were included in each Time Group. To dampen the difference<br />
between test versions normalized scores were used as the basis of anaiyais.<br />
It was hypothesized that specialization related to a billet sub6tructure<br />
should be ref lect-ed in higher SAT scores in an area. This<br />
39<br />
.
1~sp2cializat iori”scc~rc el?vat ion should occur in a w&y to lx idantifisd<br />
from the normal increase in scorers<br />
associated With time. Both overall SAT scores and scores one standard dz-<br />
viation above the mean (r&I) were examined as possible indicators. Scores<br />
of 60 or greater, cotiJiIied with other training related data were selected<br />
as the most sarviceabls indicators of specialization.<br />
2.1.4. Course .2na3v.=is. Attendance at advanced FTE! rating training courses<br />
was analyzed. Course attendance was recorded in its appropriate Time Group<br />
and also identified in relgtion to specific SAT administrations.<br />
2.1.5. Score., Course, and Experi+-nre In&stors . Many relationships be-<br />
tween scores, courses,and time were examined as possible indicators of<br />
speciaiizstion , It was felt that the indictors used should be straightforward<br />
to derive and easy to interpret . The following paragraphs detail<br />
the indicators chosen and their role in interpretation.<br />
2.1.5.1. Scores Equal to or Greater than Sixty (60s) . This is a count for<br />
either grk:lupa or test areas of the number of scores above 60 occurring.<br />
Scores over 60 in any Time Group which differ greatly from those expected<br />
in a normal distribution suggest that specialization is occurring at. that.<br />
point. Conversely, a less than expected number of 60s suggest limited employm~nt<br />
in %hat area during the Time Group in question.<br />
2. i .s .2. krScJilS P.eceiving Scores Equal to or Greater Than Sixty (6OP).<br />
This is a count of the persons receiving scores equal to or greater than<br />
60. Each person i s counted only U-12 first %ime he recaives a 60 ~:br<br />
greater. This is a means of determining if the same or diffcr?nt people<br />
are yett ing the high scores.<br />
.<br />
2 . 1 . fi . 3 . kmber of Sixties per Person (6096OP) . This is a ratio of thi<br />
number of scores of 60 or above to the people getting the ScoreE. The runher<br />
must always be one or greater, with higher numbers indicating mart<br />
repetition of 60 scores. Repetition suggests continuing on the job r?in-<br />
Iorcement .<br />
40
2.1.5.4. Pcrcelitage of Persons Rxeiving a Score of Sixty or Greater<br />
(6OPS) . This shows for each Time Group the percentage of the survey population<br />
receiving a actjre of GO or greater. It is used in addit i3i to the<br />
actual number of persons (6OP) receiving a score of 60 or greater to en-<br />
6blc comparison in the higher Time Groups where the population drops off.<br />
2.1.5.5. Persons Receiving a Score of 60 or Greater on the First SAT T&en<br />
after Advanced Training. TVo indicators were developed based on the test<br />
perf crmance of personnel on the first. SAT following advanced training.<br />
These relate the high scores to the number of people being trained and<br />
t-he scores of trained people to the ~erall high scores. Figure 2. is a<br />
modified Vann diagram depicting the relationship of the- indicators.<br />
2.1 .5.5.1. Number of. Persons Receiving a Score of Sixty or Greater on the<br />
First SAT Takers after Advanced Training Relative to the Numbx +f Pxaons<br />
Attending Advanced Training(Sc60/ScP) , This is an indication of the relationsliip<br />
between the advanced course nnd the SAT area. A large numbir suggests<br />
a close content relationship between the course and the SAT.<br />
2.1.5.5 '3<br />
. . * Number of Persons Receiving a Score of Sixty or Greater on the<br />
First SAT Taken after At.t.ending A&&riced Training Relative t-0 Th2 Num&r<br />
of Persons Receiving a Score of Sixty or Greater. (Sc6OIFjOP) . This is an<br />
indication I:Jf the effect of schools on performance as measured by SATs .<br />
When using 60s as an indicator of specialization it is important to kLiO’vJ<br />
if the high scores tire strongly school influenced or if they reflect pri -<br />
marily work experience.<br />
3 . FINDIPJGS<br />
The initial focus of this study was on the ident if icac ion of equipment<br />
specialization within an NEC. The research, in addition, yielded data oh<br />
school ut i 1 i zat ixi and per sonnel per-f ormance which, although not dircsctly<br />
related to specialization, provide important insights on the training sys.tern,<br />
FirLtings in all t.hree areas have been included in the report.<br />
3.1 SPECIALIZATION. Specialists are persons specifically respxsiblc, or<br />
ioc;ked to, for operaLion or maintenance of some limitcti part of the entire<br />
41<br />
.
- C = sc61)<br />
A 60P<br />
SCP %<br />
60P %<br />
O-<br />
.+<br />
- C = sc6o<br />
B SCP<br />
Where A = The number of persons receiving a score of 60 or greater (60P)<br />
B = The number of persons attending advanced training (ScP)<br />
C = The number of persons who both attend advanced training and<br />
receive a score of 60 or greater. In deriving C the further<br />
qualification was placed that the score of 60 be on the first<br />
SAT following advanced training (Sc60)<br />
Figure 2. Relationship of Indicators Based on Scores After School
-----<br />
DESCRIPTORS OF JOE SPECIALIZATION BASED ON JOB ~OblJnEJXZE TIXW<br />
NEC respons ibi 1 ity . The existence of specialization was identified from<br />
perSanne1 data and was confirmed by personnel survey. Speci&ization may<br />
relate to either time or ability.<br />
3.1.1, :: ,e$‘- ‘7- .‘o):es~ec:t. to Time. Specialization with respect to<br />
time mean5 that certain equipmant.s or areas become t,h2 r:espms ibilit,y ,:,,f .<br />
technicians primarily based on experience level. In this type of specialization<br />
mwt people with similar experience levels can be expected to be<br />
assigned responsibility for a specific equipment. In general. time spe-<br />
,<br />
oiali zat ion breaks di~wn into equipments assigned to newly reported personi<br />
nel and some reserved for highly emerisnced personnel.<br />
I<br />
I<br />
3.1.2. S~~eoialiZ~t.ion with Resmect. t.o Abi1it.v. This type of Specialization<br />
;<br />
. ref lccts~ 36m12 special apt ituda. ,q2ecialization of this type does not re-<br />
1<br />
3<br />
fl5ct a spxific a of ability that leads a person to &come in-<br />
. . f<br />
volveil in most corriccive maintenance in a.n ares. ,~ecialiZatiorl of this<br />
nature will begin as soon as the ability is recognized and cant inue<br />
throughout a person s tenure with the command.<br />
3.1.3. ]qo SrxciaZation. Areas with no specialization may be worked by<br />
any technician without recourse to Specialists. These are areas. that are<br />
either SimLlI e eIilm$-J or frequently t3imcJh worked that a SUf f ici~lit Ckgree<br />
of competence can be expected and utilized in all technicians.<br />
3.2. COURSE AT-lXI?DANCE. There were thirteen advanced courses applicable<br />
to the WC. Very few people attend all courses and attendance is cenc5ntraced<br />
in the first part of a technician’s first sea duty tour. Three dist<br />
inct patterns of attendant r= _ may be derived from the various courses.<br />
3.2.1 u Att_endance. These courses showed an attendance pattern with<br />
very hsavy usage prior to and following cha first patrol and then dirniniShin~<br />
rapidly tG little: or no use,<br />
43
.<br />
DE.,,RIPTORS<br />
cr c c .J?<br />
3 .2.2. Normal Xt?n&nce. This attendance pattern begins with little attendance<br />
prior to the first patrol, peaks during the second or third off<br />
crew, and thin tapers off slowly.<br />
3.2.3. L~ol Attendance. This attendance pattern shows relatively steady<br />
attendance over five or six off crews, Each of the courses that had level<br />
attendant+ also had below average attendance.<br />
3 .3 . FERSCjiWEL FEF$~jP~QKE, Personnel performance as measur+~ by the cho-<br />
sen indicators varied greatly between equipments.<br />
3.3.1 Ferc+:ntaac of Persons Peceiving a Score of Sixtv or Greater f:n the<br />
Test Following Advanced Trainina. This relationship varied from a low of<br />
17% to a high of 40 %. This largely reflects the relationship between the<br />
SAT and the course.<br />
3 .3.2.Qof + -o.- - e re’ ’ S a $A.T ScorP _ cf 7 i,r.t-v ok<br />
De .e w‘q lX Sn Following F.dv~ ted Trainln(L<br />
I .<br />
This relationship goes from<br />
a 1,:~ ,:if 21s;; tt] a high of 57%. The average value was 34% Which implies<br />
that most persons achieving high scores have not been directly influenced<br />
by advanced training.<br />
4. EXAMPLES<br />
Thirteen courses were analyzed as part of the study. Diagrams and discus-<br />
sion are provided on five courses. Two courses show specialization with<br />
respect tcl t.ime and more particularly use early in an assignment, TM0<br />
courses shots no strong evidence of specialization. One course suggests<br />
specialization with respect. to ability. For each a rectangular Venn diagram<br />
and tabular data are provided on the u s ixc ies ” related indicators.<br />
Th* tabular data shows tht averagt value in each category for all thirteer,<br />
courses in the study and the value for the particular course or area. In<br />
addition t.here is A graphical presentation of school attendance (Cl d nd<br />
” sixty 1) scores (60) with respect to time periods. The diagrams and accompanying<br />
di:.cussion are each presented on a separate palge.<br />
44<br />
.
coum? I<br />
1.1 TIME SPECIALI7,TION, COURSE 1. Specialization with respect tG time iz<br />
indicated by a high initial schooling rate dropping ‘of rapidly and the<br />
number of high scores occurring during the 2nd, 3rd, and 4th time periods.<br />
The percentage of psople getting high scores following schooling, the relative<br />
shapes Of the “CU and ‘60” curves and the low repetition rate of<br />
high scores all indicate that experience is of greater importance to high<br />
scores than school. The low score repetition rate suggests that people are<br />
rotated through responsibility for this equipment and receive little reinforcement<br />
on the equipment when not specifically assigned. Interviews<br />
wir;h fleet .zp.zrienced personnel confirmed that this equipment is geherally<br />
assigned to nzw personnel as a place to get them gently Started.<br />
45
El SCP % I I sceo w i i l cJOP%<br />
I 2 3 4 5 b 7 6<br />
I 96 96 96 9~ 61 49 19 II<br />
~ouolArotn~~~ wnod?<br />
4.2 TIM?, SPECIALIZATION, COURSE 2 t This is ix-1 area of QtXialization for<br />
newly reported personnel. This is shcwn by the high initial schooling rate<br />
and the number of “60” scores obtained during the 2nd time period. At 72%<br />
this is one of the most used courses by the NEC, however, high scores fol-<br />
lowing the course occur at little b&tsr than the chance rate. Most of the<br />
high scores rcf lect experience not schooling. The high score repetition<br />
rats of 1 .50 is the lowest of any of the thirteen areas in the study and<br />
suggests thic spacialization is short lived and that continuing reinforcement<br />
in later patrols is not present.<br />
46
DESCRIPTORS OF JOR SPECIEJ,IZATIaN f&SG!n C?kJ JOR KNOWJXDGE TESTI=<br />
r<br />
SCPX<br />
SC00<br />
scp<br />
Q@ g@<br />
6GP 6GP<br />
BUP%<br />
*vrg 59 28 1.78 .34 .48<br />
Are0 51 .18 2.05 .21 -15<br />
1 2 3 J 5 b 7 6<br />
n 96 96 9 6 96 67 4 9 29 II<br />
r~lroI/Trom~ngwnoOo<br />
4.3 NO SPECIALIZATION, COURSE 3. The score pattern shown on the graph is<br />
what might be expected with the normal progress of a maturing population.<br />
(It must be remembered that the tapering off to the right of the graphs<br />
reflects decreasing population rather than a lower percentage of high<br />
scores. ) School use peaks in the second time period and tapers off<br />
rapidly. Thi: number of people getting “sixties” after the course is b&rely<br />
above the chance level of a.normal distribution indicating that the course<br />
a.nd test were not well aligned.’ The relatively high “60s” repetition rate<br />
(2.05) suggest a good relationship between the test and the actual Vorl:<br />
being performed.<br />
47
1 1 I I ,<br />
lob 5CP I b<br />
Area b2 .36 1 ,a7 .57 41<br />
I 2 3 J 5 6 1 0<br />
w 96 96 96 96 61 49 29 II<br />
Course 4<br />
4.4 NO SPECIALIZATIOLJ, CCXJFGE 4. Tast-3 and personnel &ta provide no indication<br />
of spXialization in platform positioning within the survs;y populacion.<br />
Th2 course appears LO be closely related to test conWit. Th2 very<br />
high percentage of persons achieving a 60 who do so following schonl (57% 1<br />
suggests a stronger relationship batween school and the area than is present<br />
for most other coursss . Discussions with senior petty officers suggest<br />
that this area may actually represent an area of time specialization<br />
with specialization occurring outside the study population.<br />
4 8
I ? 3 4<br />
5<br />
n 96 96 96 96 67 49 19 II<br />
~~tr~vrh~fv~ penoas<br />
COUf5d 5<br />
4.5 SPECIALIZATION BY ABILITY, COURSE 5. D!+zCialiZatiijlI iI1 this area iS<br />
suggested by th-+ high percentage of persons who get good scores following<br />
the<br />
schooling and the very high IIGOU repetition rate (2.30, hlghezt for<br />
thirteen ‘study courses). This repetition rate<br />
suggests continuing on the<br />
job reinforcement of school material, The cours$+ although very productive,<br />
is used on only a limited basis suggesting&is employed principally<br />
when a replacement is wanted for the current specialist.<br />
The indicators derived in this study to analyze th* r?lationshigs k~iec::?2:n<br />
training, _ j& _ knowledge testing and job performance provide analytical<br />
49
.<br />
nESCRLP’TOl?S OF JOR SPB ON JOF! K&2&&G-<br />
c<br />
conclusions which are consistent with survey responses. That is if you<br />
form an hypothesis from the indicators it can consistently be confirmed by<br />
survey . The indicator values used here were manually derived from lists of<br />
computer data but, once proven, lend themselves to computer analysis on a<br />
regular basis. The rectangular Venn diagram was adopted as a solution to<br />
the puzzling problem of how to easily show accurate overlap of circles of<br />
different sizes. It takes a little practice but they are easy to. use;<br />
Interpretation of the various indicators was facilitated by the divergence<br />
of patterns. Proposed interpretations could be confirmed by the changes of<br />
the indicators with different use patterns, One could say, “if the relat<br />
ionship changas in way X then the indicator will change in way Yll and<br />
confirm the hypothesis with another pattern from the study. This use of<br />
test and course usage data permits an objective, in depth analysis of the<br />
training relationships that can usually be achieved only by extensive data<br />
analysis and survey. While not eliminating all need for on site survey in<br />
training e*;raluation it supports less and better focussed survey time.<br />
50
ADDRESSING THE ISSUES OF “QUANTITATIVE OVERKILL" IN JOB ANALYSIS<br />
Julie Rheinstein<br />
Brian S. O'Leary<br />
Donald E. McCauley, Jr.<br />
U.S. Office of Personnel Management<br />
Washington, D.C.<br />
Paper presented at the 32nd Annual Conference of the <strong>Military</strong> <strong>Testing</strong><br />
<strong>Association</strong>, November 5-9, 1990, Orange Beach, Alabama.<br />
51<br />
. .
ADDRESSING THE ISSUES OF "QUANTITATIVE OVERKILL" IN JOB ANALYSIS<br />
Julie Rheinstein<br />
Brian S. O'Leary<br />
Donald E. McCauley, Jr.<br />
U.S. Office of Personnel Management<br />
Schmidt, Hunter and Pearlman (1981) have indicate3 that molecular job .<br />
analyses are unnecessary in selection research involving traditional aptitude<br />
tests. Fine-grained, detailed job analyses tend to create ttle appeararce of<br />
large differences in jobs, whereas, in fact, the differences are of no practical<br />
significance in selection' Our recent job analysis research has focussed 011<br />
looking at how job analysis projects can be less detailed and less cumbersome<br />
while still allowing one to obtain the necessary information for test<br />
development.<br />
O'Leary, Rheinstein and McCauley (1989, 1990) discussed several "holistic"<br />
job-analytic approaches used in fcrming job families. Their research suggests<br />
that the traditional fins-yrained,job-analytic approach may not 'always bt:<br />
necessary, especially when one is in a fast reaction situation.<br />
In the first phase of a project for tt;e development of an examination for<br />
Federal proiussional and administrative career occupations, job families were<br />
formed tising a procedure developed by Rosse, Borman, Campbell and Osburn (1985)<br />
(see O'Leary, Rheinstein, and McCauley, 1990,for a detailed explanation of the<br />
formation of job families). Once the families had been established it wiis<br />
necessary to determine the importance of various abilities for job performance,<br />
and which abilities to measure by a written test.<br />
The "inferential leap,- (i.e., the inferring of htiiitan abilities important<br />
for job performance) is traditionally performed by a panel of "subject matter'<br />
experts." However, there is little gtiidsnce in the literature concerning the<br />
composition of this panel of experts. As Ldndy (1988) has so ably iridicated,<br />
incumbents are the ones n,ost famiiiar with the job itself but are ofteli<br />
unfamiliar- with the conceptual or operational characteristics of tile abilities.<br />
On the other- hand, job analysts (often psychologists) are familiar with tr&<br />
characteristics of the abilities but are often not very familiar- with the job<br />
itself.<br />
The recent work of Butler and Harvey (1988) and Harvey (1939) showing that<br />
different kinds of tixparts (e.g., incumbents versus supervisor-s) provide<br />
different views of a job, arid often conflicting information, would seeni tc<br />
suggest that one might yet different results in job-ability linkage studies<br />
depending upon the composition of the panel of experts. We were able to address<br />
this isstie by comparing the job-ability linkage ratings made by personnel<br />
resewCtl psychologists to the scirne ratings made by job incumbents.<br />
When one conducts a traditional job analysis, the question becomes how much<br />
informAtion should be collected. Often raters are asked to rate tasks on several<br />
scales such as importance, time spent, difficulty, or physical demands.<br />
Weismuller, Staley and West's (1989) research indicates that ratings one scale<br />
al-e contaminated by ratings on other scales. Anecdoctal findings from job<br />
analysts indicates that obtaining ratings on importance, time spent, etc. i:;<br />
unnecessary in most cases because the ratings are highly correlated across<br />
scales.<br />
This paper will look at several aspects of job analysis and how the<br />
traditional, fine-grained methods may result in "quantitative overkill." We<br />
Will present data on sever-a? techniques for- determir?iny tha importmce of<br />
x+-...<br />
52<br />
---!!y
abilities for test development (Studies 1 and 21, as well as look at the relationship<br />
between relative importance and relative time spent ratings (Study 3).<br />
1STUDY<br />
Data Collection: Ninety four professional and administrative occupations in the<br />
civilian Federal work force were studied. A list of major duties was developed<br />
for each occupation. Then a list of abilities was developed by reviewing the<br />
construct literature (Northrop, 1989; French, Ekstrom and Price, 1363; and<br />
Peterson and Bowans, 1982). Abilities that could not be assessed through a .<br />
written test were not included. The resulting list contained seven abilities:<br />
verbal comprehension, general reasoning, number facility, logical reasoning,<br />
perceptual speed, spatial orientation, and visualization.<br />
Using the job-specific duty lists that were developed for each occupation,<br />
five research psychologists rated the seven abilities for their importance to<br />
SdCh overall job using a five-point scale. The scale ranged from "l-Unimportant"<br />
to "5-Crucial." It should be stressed that rather than rate each ability against<br />
each duty for every job, the psychologists were asked to read each duty list in<br />
its entirety and make a "holistic" judgment concerning the importance of aach<br />
of the abilities for the overall job. The psychologists made holistic ratings<br />
for tl-le occupations. Based on each psychologist's overall ratings for each job,<br />
averages for each ability were computed for each of the six job families.<br />
Approximately 6,000 job incumbents completed and returned the inventory.<br />
As part of this inventory, job incumbents were asked to rate each of the seven<br />
abilities using the sama five-point scale used by the psychologists. The<br />
inventory stressed that the incumbents rate each ability as it related to their<br />
overall job. That is, they were asked to make a "holistic" judgment about the<br />
importance of each ability. Based on each ir,cumbant's overall rating for the‘;r<br />
job, averages were computed for each job family for each ability.<br />
RESULTS<br />
The mean overall ability ratings of the psychologists and the job<br />
incumbents, for two of the largest job families, can be found in Table 1.<br />
Table 1. Comparisons of ability ratings for psychologists (N=5) and job<br />
incumbents by job family.<br />
Psychologists Incumbents<br />
mS.D.<br />
Business, Finance & Management Occupations<br />
Mean S.D.<br />
Verbal Comprehension 4.56 (.41) 4.40 (1.10) 2306<br />
General Reasoning 4.60 (.41) 4.13 (1.13) 2306<br />
Number Facility 3.99 C.71) 3.54 (1.27) 2306<br />
Logical Reasoning 4.38 (.48) 3.59 (1.29) 23c5<br />
Perceptual Speed 1.82 (1.12) 2.72 (1.33) 2306<br />
Spatial Orientation 1.68 c.62) 1.57 (1.29) 2306<br />
Visualization 1.55 c.38) 1.95 (1.46) 2306<br />
Personnel, Administration & Computer Occupations<br />
Verbal Comprehension 4.64 l.44) 4.51 (1.03) 1197<br />
Ganeral Reasoning 4.59 (.40) 4.25 (1.07) 1197<br />
Number Facility 3.18 l.56) 3.07 (1.31) 1197<br />
Loyical Reasoning 4.44 (.50) 3.59 (1.29) 1197<br />
Perceptual Speed 1.66 c.92) 2. 72 (1.33) 1197<br />
Spatial Orientation 1.34 (.49) 1.57 (1.29) 1197<br />
Visualization 1.26 (.46) 1.95 (1.46) 1197<br />
53
The average estimate of reliability (Cronbach's alpha) of the ratings across<br />
the six job families was .99 for the psychologists and .84 for the incumbents.<br />
There was very high agreement between the psychologists and the job<br />
incumbents in terms of the relative importance of the abilities to the jobs in<br />
each job family. The product-moment correlations among the mean ability ratings<br />
for the two groups of raters ranged from .96 to .98 and rank order correlations<br />
ranged from .89 to .96.<br />
To investigate whether or not the psychologists and the job incumbents<br />
agreed in terms of the absolute importance ratings given to the abilities, tests<br />
of the significance of the difference between the means for the two groups of<br />
raters were performed. In the majority of cases, the pairs of mearls were found<br />
to be significantly different. It should be borne in mind, however, that due<br />
to the large numbers in the incumbent group, even very small absolute differences<br />
will be statistically significant.<br />
When the mean ratings for both groups were dichotomized into those<br />
determined to be important (equal to, or greater than, 3.0--"Important" on the<br />
five-point scale) and those determined not to be important (less than 3.0 on the<br />
five-point scale), the two groups of raters were found to be in perfect<br />
agreement.<br />
STUDY 2<br />
Wernimont (1988) has indicated that governmental guidelines on employee<br />
selection still emphasize the necessity of focussing on job tasks and duties in<br />
job analysis, followed by documentation and justification for the inferences made<br />
about needed abilities. Perhaps this is a function of the fact that, as Schmidt<br />
(1988) points out, the empirical data upon which these governmental guidelines<br />
are based are inadequate in many areas, particularly job analysis.<br />
Our job-analytic research provided an opportunity to add to the empirical<br />
database by determining if job-ability linkage results obtdined in a "holistic"<br />
manner (i.e., by having job incumbents rate the importance of an ability to<br />
overall job success) were comparable to the job-ability linkages results obtained<br />
by requiring incumbents to rate the importance of abilities for each duty they<br />
perform. If the results obtained from the two methods were found to be similar,<br />
significant reductions in the cost, as well as the intrusiveness, of the job<br />
analysis process for test development could be possible.<br />
Data Collection: As indicated earlier, in the job analysis inventory the<br />
incumbents were asked to rate each of the seven abilities, using a five-point<br />
rating scale, for their importance to overall job performance (i.e., the holistic<br />
approach). Average ability importance ratings were then computed for each job<br />
family.<br />
After rating the importance of the ability to the overall job, these sama<br />
incumbents were asked to rate the importance of each of the seven abilities to<br />
the performance of each individual job duty they had previously indicated they<br />
pet-formed (i.e., the traditional fine-grained, duty-ability linkage approach).<br />
Using this traditional approach, a mean ability rating was determined by summing<br />
each incumbent's ability rating for each duty performed and then dividing by the<br />
number of duties that the incumbent performed. Averages were then computed fcr<br />
each job family for each ability.<br />
RESULTS<br />
The average ratings for each ability across duties performed, for the same<br />
job families, are presented in Table 2. For ease of comparison, the mean overall
atings for the incumbents given in Table 1 are repeated in Table 2.<br />
Table 2. Comparison of incumbents' mean ability ratings across joo specific<br />
duties and mean overall ability ratings, by job family.<br />
Job Specific Holistic<br />
Mean S.D. 1<br />
Business, Finance & Management Occupations<br />
Verbal Comprehension 4.01 (.77) 2288<br />
General Reasoning 3.99 (.65) 2287<br />
Number Facility 3.22 (.85) 2250<br />
Logical Reasoning ,I 3.59 (.80) 2246<br />
Perceptual Speed 2.57 (.91) 2101<br />
Spatial Orientation 1.96 (.96) 1737<br />
Visualization 2.38 (1.08) 1897<br />
Personnel, Administration & Computer Occupations<br />
Verbal Comprehension 4.12 (.75) 1197<br />
General Reasoning 4.11 (.62) 1194<br />
Number Facility 2.66 (.94) 1155<br />
Logical Reasoning 3.74 (.77) 1174<br />
Perceptual Speed 2.20 t.96) 1048<br />
Spatial Orientation 1.73 (.88) 899<br />
Visualization 2.08 (1.05) 938<br />
S.D. Mean a<br />
4.40 (1.10) 2306<br />
4.13 (1.13) 2306<br />
3.54 (1.27) 2306<br />
3.59 (1.29) 2305<br />
2.72 (1.33) 2306<br />
1.57 (1.29) 2306<br />
1.95 (1.46) 2306<br />
4.51 (1.03) 1197<br />
4.25 (1.07) 1197<br />
3.07 (1.31) 1197<br />
3.59 (1.29) 1197<br />
2.72 (1.33) 1197<br />
1.57 (1.29) 1197<br />
1.95 (1.46) 1197<br />
The average estimate of reliability (Cronbach's alpha) of the ratings across<br />
the six job families was .80 for the incumbents' ratings across duties and .84<br />
for the incumbents' holistic ratings.<br />
There was very high agreement between the incumbents holistic ratings and<br />
the average job-specific duty ratings in terms of relative importance. The<br />
product-moment correlations among the mean ability ratings from the two types<br />
of ratings ranged from .98 to .99 and rank order correlations ranged from .85<br />
to 1.00.<br />
In terms of the absolute importance ratings given to the abilities, tests<br />
of the significance of the difference between the means for -ti-,e two types of<br />
ratings revealed tilat, in all but one case, the pairs of means were statistically<br />
different.<br />
When the mean ratings for both types of ratings were dichotomized into those<br />
determined to be important (again, equal to or greater than 3.0--"Important" on<br />
the five-point scale) and those determined not to be important (less than 3.0<br />
on the five-point scale), the two types of ratings were found to be in agreement<br />
in all but three instances.<br />
Study 3<br />
Data' Collection: As part of the five-section inventory, incumbents were asked<br />
to rate 57 generalized<br />
work behaviors (GWB'S) developed specifically for the 113<br />
professional and administrative occupations (See O'Leary, Rheinstein, and<br />
McCauley, 1990 for a detailed discussion of the development of the GWB's). The<br />
GWB's were rated for relative importance and relative time spent. Incumbents<br />
were first asked to check the GWB's they perform. Then, they rated the ones they<br />
checked using a 5-point relative importance scale ranging from "1 - Unimportant"<br />
to "5 - Crucial" and a 5-point relative time spent scale ranging from "1 - Very<br />
much below average time" to "5 - Very much above average time." Each of the<br />
55
atings for the 57 GWB's were correlated across occupations yielding 57<br />
correlations.<br />
RESULTS<br />
Correlations between the rating on the two scales ranged from .77 to .93<br />
for the 57 GWB's, with a mean c of .89, indicating a strong relationship between<br />
relative importance and relative time spent ratings.<br />
DISCUSSION<br />
Sackett, Cornelius, and Carron (1981), Cornelius, Schmidt, and'carron<br />
(1984), and othars have shown in a classification setting that holistic judgments<br />
compared favorably with those made on the basis of large-scale job analyses.<br />
Study 2, described above, showed similar results in that the relative<br />
importance of abilities as measured through linkages with job-specific duties<br />
was nearly identical to that obtained from linkages with the job as a whole.<br />
The results obtained in Study 1 suggest that similar ratings of the importance<br />
of abilities to job performance can be obtained from holistic ratings made by<br />
two types of raters--psychologists and job incumbents. The deterrllination of<br />
which abilities were important to job performance was identical for the two<br />
groups of raters.<br />
At first glance, these findings would appear to make a strong case for<br />
saying there is overkill in the job analysis process and that it is possible to<br />
streamline job analysis procedures for test development. In situations where<br />
one needs results in a hurry, holistic methods can be used. In addition, in this<br />
it was found that it is not necessary to have incumbents rate both importance<br />
and time spent, unless occupations such as police officer or fireman are being<br />
studied. It is well known that it is important for police officers to be able<br />
to use a gun properly, even though they may not spend a lot of time doing it.<br />
However, the equivalence of the results obtained from the three sources and,<br />
thus, the interchangeability of the sources, ultimately depends upon the use to<br />
which the information will be put. As was mentioned above, if job anhlysts want<br />
to determine which abilities are important for job performance, the three sources<br />
of data produce virtually equivalent results. If other types of decisions are<br />
to be made (e.g., weighting the parts of an ability test battery to achieve a<br />
composite score), the absolute differences among the mean ratings could produce<br />
different results. While one could not claim that the results obtained by the<br />
three different methods were equivalent in the terms outlined by Gulliksen<br />
(1968), it would seem that thay could be used inter-changeably in some<br />
circumstances.<br />
REFERENCES<br />
Butler, S.K. and Harvey, R.J. (1988). A comparison of holistic versus decomposed<br />
rating of Position Analysis Questionnaire work dimensions. Personnel Psvcholoay,<br />
41, 761-771.<br />
Cornelius, E.T., Schmidt, F.L., and Carron, T.J. (1984). Job classification<br />
approaches and the implementation of validity generalization results. Personnel<br />
Psvcholosv, 37, 247-260.<br />
French, J.W., Ekstrom, R.B., and Price, L.A. (1963). Kit of reference tests for<br />
coqnitive factors. Princeton, N.J.: Educational <strong>Testing</strong> Service.<br />
Gulliksen, H. (196a). Methods for detarmining equivalence of measures.<br />
56
Psycholoqical Bulletin, 70(61, 534-544.<br />
Harvey, R.J. (1989). Incumbent versus supervisor ratings of task inventories:<br />
Overrating, underrating, contamination, and deficiency. In press.<br />
Landy, F.J. (1988). Selection procedure development and usage. In S. Gael<br />
(Ed.), The iob analysis handbook for business, industrv and qovernment. New<br />
York: John Wiley and Sons, Inc.<br />
Northrop, L.C. (1989). The psychometric history of selected ability constructs.<br />
U.S. Office of Personnel Management.<br />
O'Leary, B.S., Rheinstein, J., and McCauley, D.E. (1990). Developing job families<br />
using generalized work behaviors. Proceedings of Annual MTA Conference, Orange<br />
Beach, AL.<br />
Peterson, N.G. and Bowans, D.A. (1982). Skill, task, structure, and performance<br />
acquisition. In Dunnette, M.D. and Fleishman, E.A. (Eds.), Human performance<br />
and oroductivitv: Human caoability assessment. Hillsdale, N.J. Lawrence Erlbaum<br />
Associates.<br />
Rosse, R.L., Barman, W.C., Campbell, C.H., and Osbur-n, W.C. (1985). Grouping<br />
Army occupational specialties by judged similarity. Unpublished paper, 1984.<br />
Sackett, P.R., Cornelius, E.T., and Carron, T.J. (1981). A comparison of global<br />
judgment versus task-oriented approaches to job classification. Personnel<br />
Psvcholosy, 34, 791-804.<br />
Schmidt, F.L., Hunter, J.E., and Pearlman, K. (1981). Task differences as<br />
moderators of aptitude test validity in selection: A red herring. Journal of<br />
Apolied Psvcholoqv, 66, 166-185.<br />
Schmitt, N. (1987). Principles III: Research issues. Paper presented at the<br />
second annual conference of the Society for Industrial and Organizational<br />
Psychology.<br />
U.S. Equal Employment Opportunity Commission, U.S. Civil Service Commission, U.S.<br />
Department of Labor, and U.S. Department of Justice. (1978). Uniform quidelines<br />
on emolovee selection procedures. Federal Register, 43(166), 38290-38309.<br />
Weismuller', J.J., Staley, M.R. & West, S. (1989). CODAP: A comparison of single<br />
versus multi-factor task inventories. Proceedings of Annual <strong>Military</strong> <strong>Testing</strong><br />
<strong>Association</strong> Conference, San Antonio, TX.<br />
Werni'mont, P.F. (1988). Recruitment, selection and placement. In S. Gael (Ed.),<br />
The iob analysis handbook for business, industry, and oovernment. New York:<br />
John Wiley and Sons, Inc.<br />
57<br />
/<br />
--~
Introduction<br />
Developing Job Families Using Generalized Work Behaviors<br />
Brian S. O'Leary<br />
Julie Rheinstein<br />
Donald E. McCauley, Jr.<br />
U.S. Office of Personnel Management<br />
This paper describes one phase of a large-scale research project aimed at<br />
developing and refining a list of work behaviors common to approximately 100<br />
different Federal professi&nal and administrative occupations. In this phase<br />
of our research, we were attempting to form job families and to describe how<br />
these job families differ in terms of the relative time spent on yeneral work<br />
behaviors.<br />
Traditional systems for describing jobs have usually focussed on dascribing a<br />
single job rather than attempting to determine the similarity among jobs.<br />
Thus, many of the traditional means of describing jobs, such as task analysis,<br />
are somewhat limited when one tries to compare across jobs.<br />
Since one of the ultimate uses for our research was the development and<br />
documentation of selection tests we needed a method of comparing jobs using<br />
some form of work behavior as a unit of measurement. Our goal was to develop<br />
a method of comparing jobs in such a way as to be consistent witt, provisions<br />
of the Uniform Guidelines. The Guidelines define "work behavior" in the<br />
following manner: "an activity performed to achieve the objectives of the<br />
job. Work behaviors involve observable (physical) components and unobservable<br />
(mental) components. A work behavior consists of the performance of one or<br />
more tasks. Knowledges, skills, and abilities are not behaviors although they<br />
may be applied in work behaviors" (Section 16, 43FR38308).<br />
Develooment of the Generalized<br />
Work Behaviors (GWB's)<br />
First it was proposed that a li st of occupation-specific duties be<br />
constructed. A list of general izable work behaviors could then be generated<br />
by grouping the occupationally specific duties in terms of common underlying<br />
work behaviors.<br />
We extended the work of Outerbridge (1987) in the development of our GWB's.<br />
Outerbridge had developed a list of 32 GWB's. She used duty statements<br />
contained in the occupational definitions in the Dictionary of Occupational<br />
Titles (DOT) for 24 populous Federal professional and administrative<br />
occupations.<br />
A list of 223 duty statements were extracted from the DOT. Each one was<br />
placed on a separate card. These duty statements were then sorted into<br />
categories describing similar work behaviors, first by a group of 10 perSOrlnel<br />
psychologists and later by a group of 10 occupational specialists. Nineteen<br />
sorters provided usable data. The 19 separate sorts were summarized and<br />
compared after transformation into matrix form. Matrix representation allowed<br />
the development of final work behavior categories using cluster analySiS to<br />
discovel- the structure within the summarized data and also allowed the
quantitative comparison of the sorter categorizations. A list of 32 defined<br />
work behaviors was developed. Final categories were named and definitions<br />
were added tl, suggest the commonality among duty statements making up each<br />
category.<br />
In the present study we began by reviewing OPM's Classification and<br />
Qualification Standards for each of 113 professional and administrative<br />
occupations. For- each occupation, the major job specific duty stataments were<br />
extracted from the Standards. Approximately 10 to 15 major duty statements<br />
were obtained for each occupation. In total, over 1,400 job-specific duty<br />
statements were developed.<br />
Using the 32 GWB's developed by Outerbridge, we had four psychologists sort<br />
each of the 13OOt job specific duties into the 32 GWB';, if applicable.<br />
Sorters were instructed to sort the duties on the basis of war-k behaviors.<br />
Job-specific duties that could not be sorted into the 32 GM's were placed<br />
into a miscellaneous category. Sorters were advised to put job-specific<br />
duties into the miscellaneous category if they had reservations about placing<br />
them in any one of the generalized work behavior categories. Sorters were<br />
also instructed to develop new generalized work behavior categories if they<br />
found that several job-specific duties did not fit into any of the GkB's<br />
categories but seemed to describe a common underlying work behavior.<br />
For the group of sorters, the average time required fcr the sorting tash was<br />
approximately 8 hrs. Sorters generally broke up the task into 2 half-day<br />
segments. If three out of the four sorters classified a specific job duty into<br />
a GWB category, we considered it to be a match. Using this criterion,, about<br />
75% of the 1400t job specific duties were able to be classified into the 32<br />
GWB's.<br />
The original sorters also developed 18 additional GWE categories. using the<br />
332 job specific duties that could not be sorted into the original 32 GWB's,<br />
another group of 4 psychologists sorted these job specific duties into ttle 18<br />
additiorlal GWB's and were also told to develop new GWB's if necessary and<br />
appropriate.<br />
In total, 25 additional GWB's were developed. Using the same 75% agreement<br />
criterion, 290 more job-specific duties wet-e classified into & generalized<br />
work behavior category. Out of a total of 1,438 job-specific duties, only 42,<br />
or about 3% could not be classified into d generalized work behavior. T&ble 1<br />
shows two examples out of the total 57 GM's developed.<br />
Table 1 Examples of Generalized Work Behaviors<br />
1. Presents information about work of the organization to others: e.g.,<br />
Describes agency programs and servictis to individuals or groups in community<br />
or to higher management.<br />
2. Applies regulations to organizational programs and activities: e.g.,<br />
Selects and interprets laws to ensure uniform application on wage and hour or<br />
safety and occupational health issues and in the sale arid leasing of property.<br />
59<br />
.
Ratinq the Generalized Work Behavior-s<br />
These 57 GWB's were included in a five-section inventory that was sent to<br />
about 14,000 incumbents in 113 occupations. Approximately 7,000 inventories<br />
were completed and returned. Of the 7,000 inventories that were received<br />
about 6,000 were from the 94 occupations under study herein. As part of the<br />
inventory incumbents were first asked to read all the GWB's and then check the<br />
ones they perform.<br />
One of the first questions we investigated was what types of GW8's are<br />
performed most often across jobs. Table 2 presents the six G'h'B's that are<br />
performed the most as well 2s the six that are performed the least.<br />
.<br />
Table 2 Most Frequently and Least Frequently Performed Generalized Work<br />
Behaviors<br />
Most frequently performed<br />
Writes correspondence, memoranda, manuals, technical reports, or reports<br />
of activities and findings.<br />
Interviews or confers with persons to obtain information not otherwise<br />
conver,iently available or gathers facts on specific issues from<br />
knowledgeable persons: e.g., Interviews persons, visits establishments,<br />
or confers with technical or professional specialists to obtain<br />
information or clarify facts.<br />
Analyzes and interprets information and makes recommendations based on<br />
findings: the information can be numerical or presented in verbal or<br />
pictorial form.<br />
Responds to inquirlas from the pzdblic, other agencies, Congress, etc.,<br />
concerning the work of the activity.<br />
Keeps records and compiles statistical reports.<br />
Reviews documents for conformance to standard procedures verifying<br />
correctness and completeness of data and authenticity of documents:<br />
e.g., May audit financial data.<br />
Least frequently performed<br />
Performs policing functions such as arresting and detaining persons and<br />
seizing contraband.<br />
Writes, tests, and documents computer programs.<br />
Sells property or at-ranges for disposal of property, supplies or<br />
records: e.g., Inventories, advertises and sells d delinquent taxpdyer's<br />
seized property or disposes of archival records.<br />
Plans and directs organization's public relations function.<br />
Inspects persons, baggage, or other materials. Inspection involves St<br />
least some physical action by the inspector.<br />
Drafts regulations based on an analysis of information: e.g., Drafts<br />
regulations on transportation systems or employment and training<br />
legislation.<br />
AS can be seen in Table 2, writing, interviewing, record-keepirig, ensuring<br />
ComPliaftCe of rsgulations and providing information tu the public are tne<br />
GWB'S that are most frequently performad across these professional and<br />
60
administrative occupations. The least frequently performed GW3's are those<br />
that are more specific to a particular occupation such as police w
I General management and supervisory functions<br />
II Evaluating programs and ensuring compliance with regulations<br />
III Dissemination of information<br />
IV Gathering, classifying, and organizing information<br />
V Budgeting and accounting functions<br />
VI Application of rules and regulations - making determinations<br />
VII Planning and developing policy and procedures<br />
VIII Computer utilization<br />
IX Police functions<br />
X Investigating and arbitrating<br />
XI Inter-viewing<br />
The next question addressed was "how do the job clusters differ on the 11<br />
factors of GWB's?" A dimension score was calculated for each factor by<br />
summing the item scores which loaded on that factor. These scores were then<br />
standardized. A mean profile on the 11 factors was computed for each of the<br />
six job clusters formed from the Q-factor analysis of the GWB's. TabJe 3<br />
lists for each job cluster the GWB factors that were rated above the mean for<br />
relative time spent.<br />
Table 3 Important generalized work behavior factors by occupational cluster<br />
I.<br />
II.<br />
III.<br />
IV.<br />
V.<br />
VI.<br />
General Business and Administration<br />
A. General management and supervisory functions<br />
8. Evaluating programs and ensuring compliance with regulaticns<br />
C. Gathering, classifying and organizing information<br />
D. Budgeting and accounting functions<br />
E. Planning and developing policy and procedures<br />
F. Computer utilization<br />
Claims Examining Occupations<br />
A. Applications of rules and regulations - making determinations<br />
B. Investigating and arbitrating<br />
Law Enforcement Occupations<br />
A. Police functions<br />
B. Investigating and arbitrating<br />
C. Interviewing<br />
0. Application of rules and regulations - making determinations<br />
Public Information Occupations<br />
A. Dissemination of information<br />
B. Interviewing<br />
Industrial/Labor Relations<br />
A. Investigating and arbitrating<br />
B. Interviewing<br />
C. Evaluating programs and ensuring compliance with regulations<br />
Specialized Program Analysis<br />
A. Gathering, classifying and organizing informAtion<br />
62
SUMMARY<br />
This exploratory study was one of the first applications of the GWB's.<br />
Certainly, the GWB's need refinement but, at this stage of development,, the<br />
results look promising. The results obtained in this study make sense<br />
intuitively in terrns of the GWB's performed the most and least across jobs,<br />
job dimensions, and clusters of related jobs.<br />
REFERENCES<br />
Ford, J.K., MacCallum, R.C. & Tait, M. (1986). The application of exploratory<br />
factor analysis in applied. psychology: A critical review and analysis.<br />
Personnel Psvcholoqv, 39, 291-314.<br />
Leaman, J. and Steinberg, A.G. (1990). Factor analysis versus CODAP<br />
hierarchical clustering for a leadership task analysis. Paper presented at<br />
the 98th Annual American Psychological <strong>Association</strong> Conference, Boston, MA.<br />
Outerbridge, A.N. (1981). The development of seneralizable work behavior<br />
categories for a synthetic validity model. Washington, O.C.: U.S. Office of<br />
Personnel Management. Personnel Research and Development Center.<br />
SAS Institute, Inc. (1985). SAS user's quide: Statistics (Version 51. Car-y,<br />
NC: SAS Institute, Inc.<br />
U.S. Equal Employment Opportunity Commission, U.S. Civil Service Commission,<br />
U.S. Department of Labor, 8, U.S. Department of Justice. (1978). Uniform<br />
guidelines on emolovee selection orocedures. Federal Register, 43(166),<br />
38290-38303.<br />
63
A COMPARISON OF HOLISTIC AND TRADITIONAL<br />
JOB-ANALYTIC METHODS<br />
Brian S. O'Leary, Julie Rheinstein, and Donald E. McCauley, Jr.<br />
U.S. Office of Personnel Management<br />
Washin+on, D.C.<br />
INTRODUCTION<br />
Job analysis is the foundation of many personnel systems including . .<br />
selection, performance appraisal, and training. Most often,<br />
lengthy inventories are developed and administered to job<br />
incumbents. This process can be very time-consuming and costintensive.<br />
Several researchers have looked at methods of reducing the time and<br />
the cost of job analysis. Grouping jobs on the basis of work<br />
behaviors provides one way of reducing the cost of examination<br />
development while not sacrificing test validity. Barnes and<br />
O'Neill (1978) grouped jobs for examination development in the<br />
Canadian Public Service. Rosse, Borman, Campbell, and Osborn<br />
(1984) clustered U.S. Army enlisted jobs into homogeneous groups<br />
according to rated job content in order to choose a representative<br />
sample of MOS's for test validation purposes. Rosse et al.<br />
clustered the jobs by sorting them on the basis of holistic job<br />
descriptions.<br />
Using a methodology similar to that used by Rosse et al.,<br />
Rheinstein, McCauley, and O'Leary (1989) compared sources of job<br />
information (i.e., the people doing the sorts). McCauley, O'Leary,<br />
and Rheinstein (1989) compared the job groupings that resulted when<br />
the sorters received varying amounts of job information. These<br />
studies provided some of the data to be presented below.<br />
The purpose of the present study was to compare a traditional<br />
method of job analysis (administering an inventory to a large<br />
sample of job incumbents) to the more holistic methods described<br />
above.<br />
Data Collection for the Holistic Methods<br />
j<br />
METHOD /<br />
A) Eighty-seven professional and administrative occupations in the<br />
Federal civilian work force were studied. Personnel research<br />
professionals and staffing specialists grouped the occupations into<br />
categories according to similarity of work behaviors. These raters<br />
were given descriptions of the 87 jobs which were taken from the<br />
Federal Government's Handbook of occupational Groups and Series of<br />
Classes (1969). The job descriptions consisted of the job titlr:<br />
and a brief narrative which summarized the major duties of the job.<br />
64<br />
I
---<br />
These job descriptions were printed on 5 x 9 cards and given to the<br />
raters for sorting. The General Schedule (GS) series numbers were<br />
not included. Raters were asked to sort the jobs according to<br />
similarities in work behaviors. No limitations were put on the<br />
number of categories each rater could generate.<br />
Two groups completed the sort: (1) nine members from the Office of<br />
Personnel Research and Development (OPRD) at the U.S. Office of<br />
Personnel Management consisting of eight personnel research<br />
psychologists and a personnel staffing specialist .(the _.<br />
ltpsychologistsll) and (2) seven personnel staffing specialists from<br />
seven different federal agencies (the "staffing specialistst').<br />
B) A second group of &affing specialists sorted just the job<br />
titles. The GS series numbers were not included. These raters<br />
also were asked to sort the jobs according to what they perceived<br />
to be similarities in work behaviors based on the job titles. No<br />
limitations were put on the number of categories the rater could<br />
generate.<br />
The categories resulting from each of the sorts were transformed<br />
into a 87 by 87 matrix for each rater wherein a one in a cell<br />
indicated that those two jobs were placed in the same category by<br />
the rater and a zero in a cell indicated that the two jobs were not<br />
placed together. The matrices thus derived were added together<br />
producing three summary matrices - one for the psychologists, one<br />
for the staffing specialists using job descriptions, and one for<br />
the staffing specialists using job titles only. These matrices<br />
were then factor analyzed. The six-factor solutions accounted for<br />
68.1% of the variance for the psychologists, 70.4% for the staffing<br />
specialists using the job descriptions, and 68.3% for the staffing<br />
specialists using job titles only. The overall agreement between<br />
the psychologists and the staffing specialists using the job<br />
descriptions was 60%. There was an agreement of 56.6% in the<br />
classification of the jobs between the staffing specialists using<br />
the job descriptions and the staffing specialists using job titles<br />
only.<br />
'Data Collection for the Job Inventory Method<br />
A five-section inventory that included a section of generalized<br />
work behaviors (GWB's) developed specifically for the professional<br />
and administrative occupations under study was administered to job<br />
incumbents. Approximately 14,000 inventories were sent out to<br />
incumbents, and approximately 6,000 inventories were completed and<br />
returned. As part of the inventory incumbents were first asked to<br />
read all the GWB's and then check the ones they perform.<br />
Incumbents were then asked to rate the GWB's in terms of relative<br />
time spent using a five-point scale ranging from "1 -Very much<br />
below average time" to "5 - Very much above average time." Mean<br />
time spent ratings were calculated for each GWB for each job.<br />
65
These means were factor analyzed to produce job groupings. The<br />
six-factor solution accounted for 77.4% of the variance.<br />
Experimental Desiqn<br />
The results of the factor analyses derived from each of the<br />
holistic methods were compared to the results derived from the job<br />
inventory method. For this study, the job inventory method was<br />
considered to be the criterion and the holistic methods were<br />
considered predictors. An agreement was defined as occurring when _<br />
a predictor agreed with the criterion concerning the placement of<br />
a job in a group. The percentage of agreement is the total number<br />
of agreements divided by, the total number of jobs.<br />
RESULTS<br />
The number of jobs in each grouping for the three holistic methods<br />
and for the job inventory method is shown below in Table 1.<br />
Table 1<br />
Number of Jobs in Each Groupinq for Each Method<br />
Method<br />
Grouping<br />
1 2 3 4 5 6<br />
Job Inventory 3 45 8 10 17 4<br />
Psychologist 17 24 7 10 16 13<br />
(Job Description)<br />
Staffing Specialist 4 34 7 9 14 19<br />
(Job Description)<br />
Staffing Specialist<br />
(Job Titles)<br />
2 19 12 10 30 14<br />
As one can see from this table, the number of jobs per grouping was<br />
relatively stable across all four methods in Groupings 3 and 4.<br />
Groupings 1, 2, and 5 produced relatively good agreement in terms<br />
of the number of jobs to be included between the job inventory<br />
method and two of the three holistic methods. In Grouping 6, there<br />
was relatively good agreement across the three holistic methods but<br />
not with the job inventory method.<br />
66
Table 2 below illustrates the degree of agreement between each of<br />
the holistic methods and the job inventory method. In this table,<br />
the number of job correctly assigned to each grouping are presented<br />
for each holistic method. The percentage of agreement is also<br />
presented for each holistic method.<br />
Table 2<br />
Number of Jobs Correctly Assiqned for Each Holistic Method .<br />
Methods<br />
Psychologist<br />
(Job Description)<br />
Staffing<br />
Specialist<br />
(Job Description)<br />
Staffing<br />
Specialist<br />
(Job Titles)<br />
Grouping<br />
1 2 3 4 5 6 Total<br />
1 20 5 9 12 2<br />
0 25 5 8 10 2 50 57.5%<br />
0 18 0 8 13 3 42 48.3%<br />
49<br />
Percentage<br />
of Asreement<br />
56.3%<br />
The agreement with the job inventory method was relatively similar<br />
for the two groups working with the short job descriptions and<br />
somewhat lower for the group working only with job titles.<br />
When the factor loadings derived from the job inventory method were<br />
examined more closely, it was found that for 19 jobs the difference<br />
between the primary loading and the secondary loading was less than<br />
0.1. This finding indicates that, in terms of generalized work<br />
behaviors, these jobs could be classified equally well in either of<br />
two groupings. It was decided that the definition of agreement<br />
could reasonably be expanded to include agreement with either the<br />
primary or the secondary grouping for these 19 jobs. Under this<br />
revised definition, the percentages of agreement raise to 64.4% for<br />
the staffing specialists working with job descriptions, 63.2% for<br />
the psychologists, and 54% for the staffing specialist working only<br />
with job titles.<br />
Similar results were obtained when the percentages of agreement<br />
were'calculated using only the 68 jobs for which there were unique<br />
67
factor loadings (i.e., where the difference between the primary and<br />
secondary loadings was greater than 0.1). Using these 68 jobs, the<br />
percentages of agreement were 64.7% for the staffing specialists<br />
using job descriptions, 66.2% for the psychologists, and 55.9% for<br />
the staffing specialists using job titles only.<br />
DISCUSSION<br />
The findings of this study are somewhat hard to. interpret.<br />
Agreements of 56 % to 66% are too high to conclude that the holistic<br />
methods have no merit for the purpose of grouping jobs but not high<br />
enough to advocate their replacing traditional job inventory<br />
procedures. The cause of this inability to make a clear<br />
determination may well be the criterion measure itself (i.e., a job<br />
inventory based on work behaviors) since there was extremely high<br />
agreement between holistic and traditional approaches when the jobs<br />
were viewed in terms of ability requirements rather than work<br />
behaviors (Rheinstein, O'Leary, and McCauley, 1990).<br />
There are two factors that should be examined as causing this lack<br />
'of clarity in the criterion. The first is the nature of the jobs<br />
under study. Agreement was consistently higher across all four<br />
methods for some groupings (Groupings 4 and 5) than for others.<br />
The jobs within Group 4 were primarily enforcement jobs, and those<br />
in Group 5 were primarily jobs dealing with claims examining. The<br />
jobs in the other groups were more general in nature. The fact<br />
that there was no clear factor loading for 19 jobs (21.8%) means<br />
that there was much overlap of work behaviors among the jobs and<br />
that they could be equally well grouped in more than one way.<br />
The second factor to consider is the use of generalized work<br />
behaviors. It may be that the 57 GWBls used in this study were not<br />
sufficient to distinguish clearly among the 87 jobs. This<br />
hypothesis is supported by the fact that when the job-specific<br />
duties were grouped to develop the GWB's, there were 42 duties (or<br />
3% of the total number of duties) which could not be classified<br />
into one of the 57 GWB's (O'Leary, Rheinstein, and McCauley, 1990).<br />
The development and use of additional GWB's could add other<br />
dimensions upon which groupings would differ more distinctly,<br />
thereby facilitating the assignment of jobs.<br />
Despite the shortcomings mentioned above, the use of elements such<br />
as the GWB shows promise for grouping jobs on the basis of work<br />
behaviors. An inventory that consisted of truly job-specific<br />
duties (or tasks) would not only be unwieldy but would also not<br />
Permit grouping of jobs because there would be little or no overlap<br />
of work behaviors across jobs.<br />
Until further advances are made in this area, the question of the<br />
efficacy of holistic methods of job grouping remains unsolved.<br />
. .
However, the degree of agreement obtained in this study argues for<br />
pursuing research in this area.<br />
REFERENCES<br />
Barnes, M. & O'Neill, B. (1978). Empirical analysis of selection<br />
test needs for 10 occupational groups in the Canadian Public<br />
Service. Paper presented to the meeting of the Canadian<br />
Psychological <strong>Association</strong>, Ottawa, June, 1978.<br />
McCauley, D.E., O'Leary, B.S., & Rheinstein, J. (1989)'. A -'<br />
comparison of two holistic rating methods for grouping<br />
occupations. Presentation at the Conference of the <strong>Military</strong><br />
<strong>Testing</strong> <strong>Association</strong>, San Antonio, TX.<br />
O'Leary, B.S., Rheinstein, J. & McCauley, D.E. (1990). Developing<br />
job families using generalized work behaviors. Presentation at<br />
the Conference of the <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong>, Orange<br />
Beach, AL.<br />
Rheinstein, J., O'Leary, B.S., & McCauley, D.E. (1990). Addressing<br />
the issue of "quantitative overkill" in job analysis.<br />
Presentation at the Conference of the <strong>Military</strong> <strong>Testing</strong><br />
<strong>Association</strong>, Orange Beach, AL.<br />
Rheinstein, J., McCauley, D.E., & O'Leary, B.S. (1989). Grouping<br />
jobs for test development and validation. Presentation at the<br />
Conference of the <strong>International</strong> Personnel Management<br />
<strong>Association</strong> Assessment Council, Orlando, FL.<br />
Rosse, R.L., Borman, W.C., Campbell, C.H., & Osborn, W.C. (1984).<br />
Grouping Army occupational specialties by judged similarity.<br />
Unpublished paper.<br />
U.S. Civil Service Commission. (1964). Handbook of occupational<br />
groups and series of classes. Washington, DC: U.S. Civil<br />
Service Commission.<br />
69
DeI.ayneR.Hudqeth<br />
PaulR. Fayfich<br />
TheUniversityof TWas at-in<br />
Johns. Price, SQNIDR,RAAF<br />
AirForceHumanResoums Laboratory<br />
misreseamhwasccnztuctedattheAirForceHuman Resaurces Laboratory<br />
(AFHRL) under the 1990 !3tnmsr ReseamAProgramforfacultytigraduate<br />
students, sponsored by the Air Force Office of Scientific Reseamh.<br />
Introduction<br />
l%eU!W?Occupational Me&mmentSquadron(CB&Q),RaMofphAirFo~l3?se<br />
(ZGB), Texas is responsible for the preparation, admi.nistramnon~~Ys~<br />
of USAF cccupational surveys. Using current pmxedums,<br />
initial sumeymail-outand initial data processingrangesfmnseventonine<br />
months. Currentmethe forcmllectingand processingdataforoccupational<br />
analysis studies are slew, ccmplicated, arAexpensive. C@lSQhas?D4u~M<br />
MHRL to investigate a mre efficient system of administering occupational<br />
. Of particular interest is thepossibilityofautcmatingtheprocess<br />
~pemonal cmputersandtheuseofthe Defense DataNetwork for<br />
distribxtion of weys and collection of responses.<br />
Objectives of the Reseaxh Effort<br />
Five objectives for the effort were determined: (1) to create a<br />
computeriz@dversionoftheChapel~g~tJ&surVey (chosenbecause it<br />
was &out to be administered via traditional mans) t (2) to prepare d<br />
executearesearch design for cmparing the two form of administration: (3)<br />
towllectdata; (4) toanalyzethedataanddescribetheres~lts;ard (5) to<br />
prcvide recommendations for further reseaxhanddevel~t.<br />
Thecxxqzuter~jobinvento~ foruse inthesurveywasdevelopedusing<br />
Microsoft~s QuickBasic version 4.5. ThisthirdgenerationlanguageallcxJedUs<br />
towrite, testandplace scftware inthe field inlessthan fmrweeks.<br />
Modular develmt and formtive evaluation were used throughout this<br />
Pro==.<br />
Thesoftwareconsistsoftwo ir&pendentmAulesthatare chained<br />
together. The firstn&uleccnta~~eBiographicdl and Backgrounasections<br />
of the survey, and has 13 subpmcedums and 2576 lines of code. Ihe second<br />
mduleisthemty-TaskSectionandhas18 WW al-d 1341lines of<br />
code. Itcontainstwomajorprmedmes: thefirsthasthejobinclnnbent<br />
reviewingeachof407 tasksandidentifyingthetasksperformedinhisorher<br />
job; thesecondprocedure has the job immibent rating relative time spent,<br />
with a<br />
nine-point scale, onlythosetasksidentified inthe firstpe.<br />
Inbothprmedmesthe incmbentsmd 'backup' tirevieworchangeansw~<br />
and ratings.
Tosunmmrize, informationgeneratedfrranthesecondmodule includedthe<br />
identificationoftasksperfoxmdby incmbents intheirpresentjobandtime<br />
ratingdata foreachtask. !Iheprqzmalsocollectsdataontheammntof<br />
t~~inus~thenrodule,hcrwroaniytimesan~tbacked~and<br />
changeda-, andhcwmanyerror messageswerewrittmtothescrem.<br />
Alldatawerewrittenouttodata filesamlcapbredon floppy disk.<br />
F&sear&Design<br />
Ihetwotieper&ntvariableswereformandsquence: thetwofomswere<br />
paper/pencil (P) and amputer-based (C); and the three types of sequence<br />
were: (1) P f011Wed by P, (2) P follmed by c, and (3) c follcweq by P. _.<br />
(Note: Although a fcurth variation of C folluwed by C would have been<br />
desirable, it was a condition of using this population that all persms had<br />
totakeapaper/pencilsumey.. Wejudgedthataskingairm2ntotakethesame<br />
~~~~~timeswouldaffectreliabilityofthedata.)<br />
Test/m-testplxlc&WS were used for cmparing the two survey<br />
achninistrationsandeachrespondentservedashisorher~control. tis<br />
wasnecessaryasno~ioncauldbemadeabcortthecentrdl~~and<br />
dispersion of scorn. EachresporxBnthadauniquepatternofresponseswhi~<br />
describeiihisorherjob.<br />
me three trehnents were:<br />
Time 1 (Tl)<br />
Treatment #l..... P follWe5 by P (P-P)<br />
Treatment #2..... P II 11 c (P-c)<br />
Treatment #3..... c II 81 P (C-P)<br />
~hne2 (T2) Oric&alN<br />
Inte.n~ofelapsedtimebetweenadmini&mtions, anumbrof factorswere<br />
considered. A reviewoftheliteraturegenerallysupportedthedecisionthat<br />
alapseoftwotofaurweeksbetweentimeoneandtime~wlouldbe<br />
acceptable.<br />
!I~E purpcse of treatment #I (P-P) was to prwvide a baseline against whiti<br />
theather~~~~dbeccanparedwithrespecttothevariabilityof<br />
i3?st/re-test. Althcughaperfectmatiwasnotexpected,areasonablyhigh<br />
matchwas anticipated, as jobs seldmchangemch intwoweeks.<br />
~eP-Can;iC-P~~~prwidedccsnparisondatato examine effects due<br />
to form of administration. For exanp?le, the second survey might yield a<br />
hi~~nwlberoftaskssel~thanthefirst,sincethefirstacbninistration<br />
might sensitize imumbents to the nature of their jobs. However, this effect<br />
couldbecon.fouMed fortheP-CandC-Psequ~because itispossiblethat<br />
allCadmini&rationswouldyieldahighern~~b~rof responsesbecaUseC<br />
mspoMentswereforcedtolookateach separatetask,whereaSwiththepaper<br />
versiontheymightaccidentallysmnpasta relevanttasksta~t.<br />
2Mministration and Data Collection<br />
AboutMay24, the traditional P.suveywasdispat&edAirFo rce-wide by<br />
CMSQusingtraditionalmxms andmethodsofdistribution. The first<br />
71<br />
40<br />
20<br />
21<br />
I
.<br />
~~whosesuweyswereretumedto~wereiTlanediatelysentasecond<br />
P administration with an explanatory letter. ByJuly 5, 30 secondreturns<br />
were received and used for analysis.<br />
For this research effort, printed swxeys for the P-C -bEnt were handdelivered<br />
to the Survey control Officers ax Bery&rm Kelly, aMI Randolph<br />
?d?E6. !Ihe cmputerized version was then giv& within'2-4 weeks folluihg the<br />
paper administration. For all ccaqxterized administrations, local Z-248<br />
Ew?sonal~(pcs)wereused. Each xxspotient used a separate disk for<br />
taking the sumey. For the C-P treatmsnt a reverse process wasusedstarting<br />
withE&wksandLacklandAFBs. Allpapsrversionsbiemmachine-scannedby<br />
~tocreateadatafileondisk,~~wasthenma~arrdmergedwiththe . .<br />
datastrings collectedviatheFCs.<br />
Results<br />
Any inteqxetationofthedatamsttah intoacccuntthatthe Chapel<br />
Mamgement Specialw, selected forconveniencemaynotbegenemlly<br />
mqmsentative of all Air Fame jobs. Sewti, themmberofairmen forea&<br />
treatment was sndll. CIhhd, there is no "correct" oridedl selection of<br />
eitherjcbtas~ortherelativethespentratings,hence statisticsthat<br />
relyoncentraltendemy cculdnotbeused. Finally, arnmberofmre<br />
qualitative techniques such as "think aloud" protccols or follow-up<br />
questionnairesthatcouldhaveaddmssed sane of the ~issuesraisedby<br />
the5edatawerenotpossible inthegivmthe frame.<br />
Table 1sumnarizesdataintenrksofthetotal numberoftaskssel~by<br />
all individuals for each administration, the mean number of tasks selected by<br />
ea~person,thepercentoftasksselectedinboththefirstandsecond<br />
administrations, andthemeanchange innuinberoftasks selededper<br />
individual.<br />
Table1<br />
summary of TaskSelectionData<br />
N=30 N=17 N=17<br />
Treatment Pl - P2 Pl - C2 Cl - P2<br />
. .<br />
Total tasks select. 3,684 3,878 1,808 1,950 2,418 2,267<br />
Mean each respond. 123 129 106 115 142 133<br />
Selectedbothadm. 81% 82% 75%<br />
-change . +6.5% +8.4% -8.9% .<br />
. Anotherresearchissuewaswhether, forthejobtasksselectedby<br />
InWmbents, theestimatesof"TimeSpentin PresentJob"wouldvaKyasa<br />
function of type of admhhtmtion. (This isestimatedwithaninepoint<br />
scale:l= "VeKy small amount"). Table2showsthechangeinratingsfrm~e<br />
first administration to the secmd where, for example, for the Pl - P2<br />
administration of 2,988 tasks chosen, 1,283 time ratings were the same, 387<br />
taskswereratedonepointhi~~formoretimespent, 342 wereratedorle<br />
Point lower forlesstime spent, etc.<br />
72
Change<br />
IIf<br />
If<br />
1:<br />
1;<br />
0<br />
tl<br />
t2<br />
t3<br />
t4<br />
t5<br />
t6<br />
t7<br />
t8<br />
N = 30 N = 17<br />
Pl - P2 Pl - c2<br />
# of Tasks<br />
0<br />
:<br />
::<br />
124<br />
239<br />
342<br />
1,285<br />
387<br />
281<br />
146<br />
76<br />
11 81<br />
2,98;<br />
Table 2<br />
Variation in Time Spent Ratings<br />
Change # of Tasks<br />
1; 3<br />
-6 :<br />
1; :A<br />
1; 1:;<br />
-1 560 154<br />
tl 231<br />
t2 137<br />
t3 84<br />
t4 78<br />
t5<br />
t6 i<br />
t7 0<br />
t8 0<br />
1,488<br />
Change<br />
7;<br />
1;<br />
1;<br />
1:<br />
0<br />
tl<br />
t2<br />
t3<br />
t4<br />
t5<br />
t6<br />
t7<br />
t8<br />
cy =-17p2<br />
# of Tasks<br />
x<br />
16<br />
::<br />
113<br />
229<br />
364<br />
657<br />
133<br />
ii;<br />
19<br />
i<br />
i<br />
1,812<br />
We also wanted to examine the data which reflected job tasks chosen for<br />
on administration but not the other, as to whether estimates of "Time<br />
Spent..." varied between the two forms of administration. These data are<br />
displayed in graphic form in Tables 3, 4 and 5.<br />
Table 3<br />
Time Ratings Pl - P2<br />
not selected both<br />
N = 30<br />
Pl not P2 = 696<br />
P2 not Pl = 890<br />
Table 4<br />
Time Ratings Pl - C2<br />
not selected both<br />
Number of Tasks<br />
600(<br />
Y<br />
1 2 3 4 6 6 ? a 9<br />
Rating Scale<br />
Number of Tasks<br />
200<br />
m rrlrctrd pi not p2 6X52 wkoted p2 not pl<br />
Pl not C2 = 320<br />
C2 not Pl = 460<br />
60 /<br />
0<br />
1 2 3 4 6 6 7 8 9<br />
Rating Scale<br />
- WlWted Pl not C2 6i? ukctod ~2 not Pl<br />
73
Table 5<br />
Time Ratings Cl - P2<br />
not selected both<br />
N = 17<br />
Cl not P2 = 606<br />
P2 not Cl = 455<br />
Discussion<br />
Number of Tasks<br />
260 I - - - -<br />
Rating Scale<br />
- _ - -<br />
W aelected cl not ~2 TX wlected ~2 not cl<br />
Perhapsthemcstiq2ortantbenefitsofth.is researcharethat~ncrw<br />
hasaprotoQpeccznputerizedsurvey, andthepotential of autcxnationhasbeen<br />
demonstrated. Ajobsurveywasadministesedwithmirroccanpprter (andcouldbs<br />
distributed electmnically). Ihe data were captured electronically and<br />
analysiswas accq?lished ina fewhours.<br />
Tables 4 and5de1~nstmtethene&foradditional research where there<br />
seems to be a disproportionate rkmber of responses for "1" ard "5". Inforrflal<br />
feedbacksuggeststhatthe instructions for estimating "TimsSpentonPresent<br />
JoWcanbeintfxpreteiiin~~~~thanoneway. Ihisszqgeststhatfurther<br />
studyoftheeffecksofthewordingofthese instructionsiswarrant32d.<br />
IheP-Pbaselinedata, CornparedwithPl-CZ, suggestthatthese foms of<br />
administration are ccgclparable in terns of test/m-test. They both indicate<br />
thattakingtheinventorycauses increasedssnsitivitytoone'sjob interms<br />
ofnumberoftaskschosen. IfsamethingliketheHawthomEffectwas<br />
vting, an incmaseintasksselect&inP forthe C-PtreaWantshouldbe<br />
evident, and is not found. Datathatsupp0rttheconfcundingeffectsof<br />
having to see each item on the coquter are evident. In Table 1, for Pl-C2<br />
andCl-P2,therewasan i,mreze in the mean rmber of items selected of 8.4%<br />
whenusingtheccq~ter forthesecondadministrationandadscredse of 8.9%<br />
--w paper-<br />
Theccq@ervemionalsodemonstrated,onalirn.itedbasis, huwasurvey<br />
can be "bran&&" or f~ppeciff to only display job tasks relevant for a given<br />
personbasedonpriormsponses.Improvedaaxraq'andsignifidanttime<br />
saviqscculdresult. Also, theabilityofthecq+kertopmdessdata<br />
duringtheadministrationofajobsurveycouldalsoresultinnewmethodsand<br />
levelsofmviewbyjob inxnkents.<br />
74<br />
I
Recammenaations<br />
Werecxararrendthat~UsAFbeginirrmrediatelytodevelap~~jab<br />
inventories. ~npartiaikr: (l)aqrehensiveelectmnicnsWrkneedsto<br />
bedesignedthatwillallawaKsQtoelectranicallya~, p-and<br />
archive occupational surveys. (lN?.monnel oonaept III" (E-3) cklm?ntly under<br />
d~~~tbyAirForce<strong>Military</strong>EersoMelcenter,withgatewayS~each<br />
Air ForceconsOlidatedEWe Personnel Office, shcnildbe investigated further):<br />
(2) because the CcBlpxzter offers display, rwiew and reporting capabilities not<br />
available with traditional. paper administration, we strongly recQBnmerd that<br />
planning efforts to use this capability be undertakenas soonaspossible; and<br />
(3) ~isneededtooptimizedesignof~~izedsurveyswhichmi~t _.<br />
im=ludeb~or~ing,differentidLfsedbackbasedonindiviltual<br />
~tt=nsofresponse,Pro=du= for review and correction of responses by<br />
unmbe.nts, arvlothersumey$zsign feakxreswhichareuniquetothe<br />
ccslpnrter-<br />
misreportdescribes 3xsezu&whichmmpamdpaperaMpmcilve.rsus<br />
cxaqmter-baseiadministmtionofa USAFJobInventoxy forthechapel<br />
ManagementspeCialty. !tkst/mMzest administration pmmdures, with each<br />
subjectactingasa~n~l,wereused.~edatashawtherewasa81%match<br />
for Paper (P) followed by P; 82% for P followed by aqute.r (C) and 75% for C<br />
follow& by P. !the data suggest that aqxter-based administration will<br />
impmvetheyieldoftasks chosen, butthatitsusetocollect estimates of<br />
"Time Spent..." ratingsispmblematicwiththe cumentsumey instructions.<br />
Cmputerizing of job surveys is feasible. Additional efforts are need&i to<br />
cr&ze a valid, reliable Air Force wide automated system<br />
75
NPT ENEANCEN.ENTS To TEE<br />
OCCUPATIODlAIi RBSBARCE DATA BAXX:<br />
Joe Menchaca, Jr., Capt, USAF<br />
Jody A. Guthals, 2Lt, USAF<br />
Air Force Human Resources Laboratory (AFHRL/MOD)<br />
Brooks Air Force Base, Texas<br />
Lou Olivier, Glenda Pfeiffer<br />
OAO Corporation<br />
INTRODUCTION<br />
The Occupational Research Data Bank (ORDB) is an on-line, data<br />
repository providing users immediate access to a variety of<br />
occupational information about Air Force specialties (AFS) and the<br />
people who perform duties in them. The combination of several<br />
unique subsystems gives ORDB the ability to retrieve many otherwise<br />
dispersed sets of data from a consolidated data bank. Instead of<br />
the normal laborious and time-consuming task of finding personnel<br />
background information by formal requests to computer data bases,<br />
searching Air Force regulations, or searching a library of<br />
technical reports and previous studies, the ORDB allows the user to<br />
streamline occupational data retrieval by providing easy access to<br />
data from all these sources. Two years ago a paper was presented<br />
discussing some planned enhancements and applications of the ORDB<br />
to assist manpower, personnel, and training (MPT) decision makers<br />
and analysts in the acquisition of Air Force weapon systems<br />
(Longmire and Short 1988). The purpose of this paper is to<br />
describe implementation of these enhancements and discuss some<br />
actual MPT applications.<br />
BACKGROUND/OVERVIEW<br />
Plans for the development of the ORDB began in 1978. While<br />
vast quantities of information were available about Air Force<br />
occupations, the data were widely dispersed among various different<br />
organizations, with many different formats, and degrees of<br />
coverage. At that time, the Air Force Human Resources Laboratory<br />
(AFHRL) maintained 29 different types of computer files from by<br />
many different sources. Also, AFHRL housed Air Force technical<br />
reports dating back to 1943 and was the official Air Force<br />
repository of all occupational study data files generated by the<br />
USAF Occupational Measurement Center (USAFOMC). Other organizations<br />
(HQ USAF, ATC, AFMPC, etc.) had their own data bases and generated<br />
numerous recurring reports, regulations, and studies.<br />
Occupational researchers needed consolidated information that was<br />
easily and rapidly accessible.
The ORDB was designed and continues to reside on the AFHRI,<br />
UNISYS 1100/82 mainframe at Brooks, Texas. The programs within<br />
ORDB were created in a user-friendly, tutorial environment so that<br />
even the most novice of computer users could access its<br />
information. Beyond the original scope of ORDB's development, the<br />
current enhancements to the system focus on ways to make the ystem<br />
more useful to a variety of users such as researchers, OMC<br />
analysts, and MPT managers who determine MPT requirements for<br />
already existing weapon systems and who must forecast similar<br />
requirements early in the planning stages of new weapon system<br />
acquisitions (Longmire and Short, 1988).<br />
The ORDB provides storage and on-line retrieval of a variety<br />
of occupational data within its seven major subsystems. Figure 1<br />
diagrams the ORDB. It also shows each subsystem's primary area of<br />
use. The check marks within the circles indicate new subsystems<br />
which are described beldw.<br />
(1) The CODAP [Comprehensive Occupational Data Analysis Procframs)<br />
Subsvstem allows rapid retrieval of reports from the most recent<br />
occupational study on an AFS.<br />
(2) The Enlisted AFSC Information Subsvstem (EAIS) contains AFSC<br />
descriptions (for ladder and career field), progression ladders,<br />
and prerequisites for the years 1978 to the present, and number<br />
change history (1965 - present).<br />
(3) The Officer AFSC Information Subsvstem (OAIS) allows retrieval<br />
of officer AFSC information similar to that available in the EAIS<br />
(1976 - present).<br />
Support<br />
Weapon System<br />
Officer AFSC<br />
Figure 1. ORDB BUBBYBTEMB<br />
77
(4) The Computer-Assisted Reference Locator (CARL) provides<br />
listings of occupational studies, technical reports, films, and<br />
, other documents related to Air Force jobs.<br />
(5) The Enlisted Statistical Subsvstem (ESS) provides statistical<br />
distributions of selected data elements for enlisted personnel on<br />
the Uniform Airman Record (UAR) file at the end of the calendar<br />
year as well as personnel with records on the Pipeline Management<br />
System (PMS) file (1987 - present).<br />
(6) The Archived Statistics Subsvstem contains pre-generated<br />
statistics on demographic, aptitude, education, training, turnover, __<br />
and duty-related information on Air Force enlisted personnel<br />
previously generated for calendar years 1980-1986. The CY 89 task<br />
phased out pre-generated statistics which are now accessed from<br />
this subsystem.<br />
(7) The Weaoon Svstem Information Subsvstem (WSIS) permits access<br />
and retrieval of Air Force occupational information by weapon<br />
system, special experience identifier (SEX), or AFSC.<br />
The capabilities which ORDB developers are seeking are best<br />
summarized as an up-to-date occupational research data base,<br />
containing a wide variety of both historical and current<br />
information on United States Air Force enlisted and officer career<br />
fields. The CARL subsystem is continually updated as material<br />
becomes available, AFSC descriptions are updated semi-annually,<br />
Occupational Measurement Center study reports are loaded into the<br />
system on a continual basis as soon as an analysis is complete, and<br />
all modifications are documented in the User's Manual and/or<br />
Procedural Guide on a day-to-day basis (Olivier et.al.). Overall,<br />
the conscientious effort to update and maintain the ORDB is the key<br />
to its success.<br />
RECEN!I? ENHANCEMENTS<br />
Presently, research is continuing with the ORDB to facilitate<br />
the planning and analysis of MPT requirements earlier in the weapon<br />
system acquisition process. There are presently two primary areas<br />
of improvement. First was the development of the Weapon System<br />
Information Subsystem (WSIS). A second major enhancement was the<br />
conversion of the Statistical Variable Subsystem from aggregated<br />
occupational statistics to current user-defined population<br />
statistic variables from the UAR and PMS.<br />
As was mentioned earlier, the WSIS allows users to obtain<br />
information cross referenced between a specific Air Force weapon<br />
system, SEIs and enlisted AFSCs or any combination thereof. An<br />
enlisted AFSC is a six character field (i.e. 41131C) including<br />
suffix. Prefixes are not used. Special Experience Identifiers are<br />
three digit numeric codes which identify special experience not<br />
otherwise reflected in the USAF enlisted classification structure.<br />
SEIS are used to achieve greater flexibility in the management of<br />
personnel, particularly in the quick identification of specially<br />
_’<br />
78
qualified resources to support contingency operations or<br />
situations. All SE1 information was derived from AFR 39-1, Airman<br />
Classification and within the WSIS has been matched to appropriate<br />
weapons systems and AFSCs.<br />
The WSIS can retrieve information by calendar year beginning<br />
with a base year of 1987. It allows a user to enter a weapon<br />
system and obtain all the related enlisted AFSCs and SEIs, or vice<br />
versa. Weapon system identification was derived from the 1988 AF<br />
Magazine Almanac with the intention of creating a comprehensive<br />
listing of existing/active USAF Weapon Systems including all<br />
airplanes, helicopters, missiles, etc. The data are arranged by<br />
mission type (i.e. Strategic Bombers, Trainers, Helicopters, etc.)<br />
with the actual weapon systems listed for each mission type.<br />
In the past, statistics on 125 variables were computed each<br />
year against the most current UAR, Airmen Gain/Loss (AGL), and PMS<br />
data files and then uploaded into the system. With the new<br />
Enlisted Statistical Subsystem (ESS), this process has recently<br />
changed for the sake of providing current data as soon as it<br />
becomes available. As was mentioned earlier, the ESS is comprised<br />
of records of all enlisted personnel on the UAR file as of 31 Dee<br />
of each calendar year and all personnel with records on the PMS<br />
file who completed training in that year. Statistical data is<br />
requested by Duty AFSC or PMS Course ID. One and two-way<br />
distributions for selected variables are included in the output.<br />
Where appropriate, means, standard deviations, and row/column<br />
counts are also listed.<br />
The UAR as'of 31 Dee for each year contains all enlisted<br />
personnel to include active duty projected gains, some recent<br />
losses, etc. AFHRL personnel scrub the file so that the resultant<br />
file contains records of all enlisted personnel on "active duty".<br />
Only certain selected fields are used in the ORDB ESS. A total of<br />
44 UAR variables have been selected for the ESS.<br />
The PMS files contain records of all personnel who attended<br />
training at Air Force Technical Schools. For the purpose of the<br />
ESS, only active duty enlisted personnel who have completed<br />
training in the given year are selected. Four variables have been<br />
selected for the ESS to bring the total to 48 variables. The<br />
variables are listed in Table 1. For a two way distribution, one<br />
variable must be marked with an asterisk.<br />
At present, information stored in ORDB is AFSC-specific.<br />
Current modifications to the Weapon System Information Subsystem<br />
(WSIS) and the Enlisted Statistical Subsystem will soon yield<br />
additional information by weapon system. This capability should be<br />
available by the end of CY90. The result will be an improved<br />
occupational research data source containing a wide variety of both<br />
historical and current information on enlisted and officer career<br />
fields of the United States Air Force. There has also been a<br />
recent proposal to place the data base on compact disc, for use on<br />
a write-once-read-many (WORM) drive. A significant increase in the<br />
number of users who could access the system would result.<br />
79
I.<br />
1 Duty AFSC<br />
2*Secondary AFSC<br />
' 3*ASVAB-General<br />
4*Subst. Abuse-Lvl.<br />
5 Primary AFSC Prefix<br />
6 Secondary AFSC Pre.<br />
7 Base of Assignment<br />
8 Major Command<br />
9 SEI-PAFSC-1st<br />
lO*Age-Years<br />
11 SEI-PAFSC-4th<br />
12 Ethnic Group<br />
13*Marital Status<br />
14*PMS Training Length<br />
15 PMS Final Rate<br />
16 PMS Course ID (AFSC)<br />
17 PMS Term. Reason<br />
18 Primary AFSC 35*Number of Dependents<br />
19*ASVAB-Electronic 36*ASVAB-Mechanical<br />
20*ASVAB Admin. 37 Unfavorable Info. File<br />
2lDuty AFSC Prefix 38 Substance Abuse-Type<br />
22*AFQT Score 39*APR-Most Recent<br />
23*Current Grade 40*EPR-Most Recent<br />
24*Time in Grade 41*Current Flying Status<br />
25*TAFMS-Months 42 Current Location<br />
26*Cat. of Enlist. 43 SEI-PAFSC-2nd<br />
27 SEI-PAFSC-3rd 44 Security Clearance<br />
28*Training Status 45 SEI-PAFSC-5th<br />
29 Mental C,ategory 46 Cat. of Enlisted Status<br />
30 Academic Ed. 47*Program Element Code<br />
31*Sex 48*Conus-Overseas<br />
32 Race<br />
33 Prof. <strong>Military</strong> Education<br />
34 <strong>Military</strong> Status of Spouse<br />
Table 1. ES8 VARIABLES<br />
MPT APPLICATIONS<br />
ORDB relates many dispersed sets of data into a consolidated,<br />
rapidly accessible data base. Instead of the normal laborious and<br />
time-consuming task of finding background information by formal<br />
requests to computer data bases, searching Air Force regulations,<br />
or searching a library of technical reports and previous studies,<br />
the ORDB allows users to streamline data retrieval while saving<br />
computer resources. ORDB is valuable for aiding research design,<br />
conducting historical and cross-specialty analyses, and guarding<br />
against duplication of effort and inconsistencies between data<br />
bases. ORDB access facilitates planning and analysis support of<br />
MPT requirements earlier in the weapon system acquisition process.<br />
The WSIS is proving to be helpful to MPT planners and analysts<br />
requiring occupational information by AFSC or total weapon system.<br />
Researchers within AFHRL are primary users of the ORDB. A<br />
recent example of the ORDB's many uses was a CODAP retrieval of all<br />
duty descriptions for certain AFSCs. These descriptions were then<br />
used in the development of a taxonomy determining skill knowledge<br />
and ability to be used in weapon system acquisition to determine<br />
MPT requirements. The ORDB has been identified as a key component<br />
of several high-priority AFHRL research projects. Some of these<br />
are the Training Decisions System (TDS), the Advanced On-the-Job<br />
Training System (AOTS), Job Performance Measurement, and the Basic<br />
Job Skills Project. The advanced On-the-Job Training System (AOTS)<br />
program used the ORDB CODAP subsystem for their initial research.<br />
Future use of the ORDB to support the program at the base level,<br />
called the Base Training System (BTS), is presently being<br />
Considered. The proposed portable WORM drive ORDB would enable<br />
more People to use the ORDB.<br />
The ORDB is a critical resource to projects underway as part<br />
of the MPT Integration effort. Work at ASD/ALH, the Air Force MPT<br />
Directorate, Continues to require use of the ORDB. DOD directive<br />
5000.53 calls for MPT integration early in weapon system<br />
-. I, .>-.w- . .- - ._<br />
8 0
acquisition. ASD/ALH made extensive use of the ORDB in April 1989<br />
when a rapid analysis of MPT and safety factors for the A-16 was<br />
called for. Included in this analysis was a data retrieval of the<br />
target population, maintenance personnel and demographics. The<br />
study's objective was to determine the target maintenance personnel<br />
who were applicable to the A-16 and what their job entails.<br />
There are several other key ORDB users. At the Occupational<br />
Measurement Center, the ORDB is used to provide quick in-depth<br />
orientation to AFSCs and as a rapid response tool to high level<br />
management queries. The Training Performance Data Center (TPDC) in<br />
Orlando, Florida, has benefitted from accessing the system to<br />
obtain prompt, up-to-date data on Air Force specialty structures<br />
which have in turn been made available to a number of DOD agencies.<br />
TPDC researchers will soon be providing the Laboratory with a<br />
process mapping eguipme,nt-to-occupations, which will be a vital<br />
component in the Weapon System Information Subsystems development.<br />
The Air Force Management Engineering Agency (AFMEA) is hoping to<br />
use the ORDB as a cross reference for manpower studies. AFMEA is<br />
conducting special interest studies in support of an Air Staff<br />
requested study to determine which career fields report excessive<br />
man hours. Finally, AFMEA is doing a comparison of skill and<br />
experience to determine changes in the force structure. The ORDB<br />
will provide needed information to find a relative value of<br />
experience in Air Force personnel.<br />
Access to the ORDB by users outside the Laboratory is<br />
available via commercial and DSN telephone lines and through the<br />
Defense Data Network (DDN), a capability which conveniently serves<br />
a number of outside agencies currently having or requesting access.<br />
REFERENCES<br />
Longmire, K. M., and Short, L. 0. (1988, December). The<br />
Occupational Research Data Bank: A Key to MPTS Analysis.<br />
Proceedinas of the 30th Annual Conference of the <strong>Military</strong><br />
Testins <strong>Association</strong> (262-267). Arlington, VA.<br />
Longmire, K. M., and Short, L. 0. (July 1989) Occuoational<br />
Research Data Bank: A Key to MPTS Analysis Support<br />
(AFHRL-TP-88-71). Brooks AFB, TX: Manpower and Personnel<br />
Division, Air Force Human Resources Laboratory.<br />
Olivier, L, Pfeiffer, G., and Menchaca. J. Jr. (January 1990)<br />
Occunational Research Data Bank User's Manual<br />
(AFHRL-TP-89-62). Brooks AFB, TX: Manpower and<br />
Personnel Division, Air Force Human Resources Laboratory.<br />
81<br />
,
ASCII CODAP: PROGRESS REPORT ON APPLICATIONS<br />
OF ADVANCED OCCUPATIONAL ANALYSIS SOFTWARE *<br />
William J. Phalen, Air Force Human Resources Laboratory<br />
Jimmy L. Mitchell, McDonnell Douglas Missile Systems Company<br />
Darryl K. Hand, Metrica, Inc.<br />
Abstract<br />
The development of automated procedures for selecting job and task module types from a<br />
hierarchical clustering solution and the interpretive software associated with these procedures were<br />
reported at the 1987 and 1988 MTA conferences. Over the last two years, operational testing and<br />
evaluation of this software has demonstrated its value in terms of enhanced analytic capabilities and<br />
accelerated completion of the analytic process. This report provides informative examples and<br />
experiences to illustrate how complex analyses have been accomplished by using the job and task<br />
module type selection and interpretation software to extract, organize, and display latent bits of<br />
relevant information from a COCAP database.<br />
Introduction<br />
The principal occupational analysis technology in the United States Air Force is the<br />
Task Inventory/Comprehensive Occupational Data Analysis Programs (CODAP) approach.<br />
This system has supported a major occupational research program within the Air Force<br />
Human Resources Laboratory (AFHRL) since 1962 (Morsh, 1964; Christal, 1974), and an<br />
operational occupational analysis capability within Air Training Command’s USAF<br />
Occupational Measurement Squadron since 1967 (Driskill, Mitchell, & Tartell, 1980;<br />
Weissmuller, Tartell, & Phalen, 1988). The CODAP system is now used by all the U.S. and<br />
many allied military services, as well as a number of other government agencies, academic<br />
institutions, and some private industries (Christal & Weissmuller, 1988; Mitchell, 19SS).<br />
Recently, the CODAP system was rewritten to make it more efficient and to expand its<br />
capabilities (Phalen, Mitchell & Staley, 1987). In the process of developing this new ASCII<br />
CODAP system, several major innovative programs were created to extend the capabilities<br />
of the system for assisting analysts in identifying and interpreting potentially significant jobs<br />
(groups of similar cases) and task modules (groups of co-performed tasks). Initial<br />
operational tests of these automated analysis programs were conducted and preliminary<br />
results were reported at previous conferences (Phalen, Staley, Sr Mitchell, 1988; Mitchell,<br />
Phalen, Haynes, & Hand, 1989).<br />
Over the last two years, operational testing and evaluation of new interpretive software<br />
has continued and these programs have demonstrated their value in terms of enhanced<br />
analytic capabilities and their potential to accelerate completion of an occupational analysis.<br />
Some of these programs have been released into the operational version of ASCII CODAP<br />
while others remain experimental; i.e., they are not yet in final operational form. In this<br />
presentation, we want to provide some examples of this continuing work. Such examples will<br />
also serve to illustrate how complex analyses can be accomplished more expeditiously by<br />
using the job and task module type interpretation software to extract, organize, and displ:ly<br />
latent bits of relevant information from an occupation-specific CODAP database.<br />
* Approved for Public Release; Export Authority 22CFR125.4 (b)(13).<br />
.<br />
--<br />
!<br />
I I<br />
I<br />
I<br />
1
A Suite of Advanced Interpretive Assistance Programs<br />
A set of seven programs has evolved gradually over the last few years which are meant to<br />
assist analysts in interpreting job and task clusters; some of these were completed in time to<br />
be released with the initial version of ASCII CODAP. Others are still being refined and thus<br />
are not yet ready for operational use. It is helpful to have an overview of the entire set of<br />
programs, so everyone can see how the programs relate to one another and to their ultimate<br />
objective. These programs are shown in Figure 1 below.<br />
Identify Appropriate Clusters<br />
IdenNy/Display Core Tasks<br />
IdenIify/I>isplay Core C&es<br />
Relationship of Task Clusters<br />
lo Job Clusters<br />
Case Cl uslers Task Cl usfers<br />
(Job Types) (Task Modules)<br />
JOBTYP MODT YP<br />
I I<br />
CORTAS TASSET<br />
I<br />
CASSE T<br />
I<br />
CORCAS<br />
JOBMOD<br />
Figure 1. The Set of Advanced Interpretive Assistance Programs<br />
(Boldface = operational program in ASCII CODAP; Ifafic = experimental, not yet released).<br />
The operational programs are briefly described as follows:<br />
JOBTYP automatically identifies stages in most branches of a hierarchical clustering<br />
DIAGRM which represent the “best” candidates for job types. First, core task homogeneity,<br />
task discrimination, a group size weight, and a loss in “between” overlap for merging stages<br />
are calculated for all stages and these values are used to compute an initial evaluation value<br />
(for JOBTYP equations, see Haynes, 1989). This value is used to pick three sets of initial<br />
stages; these are then inserted into a super/subgroup matrix for additional pairwise<br />
evaluation, in order to further refine the selection of candidate job type groups Three final<br />
sets of stages (primary, secondary, and tertiary groups) are then reported for the analyst to<br />
use as starting points for selecting final job types.<br />
CORTAS compares a set of group job descriptions (“contextual” groups) in terms of<br />
number of core tasks performed, percent members performing and time spent on each core<br />
case, and the ability of each core task to discriminate each group from all other groups in<br />
the set. It also computes for each group an overall measure of within-group overlap called<br />
the “core task homogeneity index”, an overall measure of between-group difference called the<br />
“index of average core task discrimination per unit of core task homogeneity”, and an<br />
asymmetric measure of the extent to which each group in the set qualifies as a subgroup or<br />
supergroup of every other group in the set.<br />
TASSET compares clusters of tasks (modules) in terms of the degree to which each cluster<br />
of tasks is co-performed with every other task cluster (supergroup/subgroup matrix). Within<br />
each cluster, TASSET computes the average co-performance of each task with every other<br />
task in the cluster (representativeness index) and the difference in average co-performance<br />
of the same tasks with all other task clusters (discrimination index). TASSET also identifies<br />
83
tasks which meet the co-performance criterion for inclusion in clusters in which it was not<br />
placed (potential core tasks), as well as tasks that are highly co-performed with all clusters<br />
except the cluster under consideration (negatively unique tasks).<br />
The experimental programs are as follow:<br />
MODTYP - Just as the JOBTYP program automatically selects from a hierarchical<br />
clustering of cases the “best” set of job types based on similarity of time spent across tasks,<br />
the MODTYP (module typing) program selects from a hierarchical clustering of tasks the<br />
“best” set of task module types based on task co-performance across cases. The term “best”<br />
means that the evaluation algorithm initially optimizes on four criteria simultaneously (i.e.,<br />
within-group homogeneity, between-group discrimination, group size, and drop in “between<br />
overlap” in consecutive stages of the hierarchical clustering). After all stages of the clustering<br />
. have been evaluated on these criteria, primary, secondary, and tertiary sets of mutually<br />
exclusive task clusters are selected as first-, second-, and third-best representations of the<br />
modular structure of the hierarchical clustering solution. The three sets of groups are then<br />
input to another evaluation algorithm which computes super- and subgroup indices between<br />
all pairs of groups in the primary solution within the same TPath range. Based on the<br />
combined results of both evaluations, the sets of groups are revised. The final set of primary<br />
groups is input to the TASSET and CORCAS programs to provide analytic and interpretive<br />
data for each primary cluster of tasks. MODTYP output also reports the initial and final sets<br />
of primary, secondary, and tertiary groups and their evaluation indices.<br />
In addition to the data summaries of groups noted above, which can be very complex, a<br />
graph of all final stages in TPATH sequence is generated to help the analyst understand the<br />
relationship among the possible levels of clustering. An example of such a graph is shown<br />
in Figure 2. In this case, Level 1 = primary group; Level 2 = secondary group; and Level<br />
3 = tertiary group. By showing a different symbol for each level, the graph highlights the<br />
most likely choices of groupings (task modules) for the analyst’s consideration. Used in<br />
conjunction with the Task Cluster Diagram, this display provides a quick way for analysts to<br />
make preliminary judgments as to the appropriate groups to select for further evaluation.<br />
hIODTYP MODULE TYPING TEST RUN R-l Avionics Test Station, AFS 451X7 Page 13<br />
Graph of All Final Stages in TPATH Sequence<br />
1 289 578 867 1157<br />
Stage TPATH Range Level + ____________--__---_ + ------------------- ---- + ___________-___-----___ + --------------<br />
321 l- 72 2 __--__<br />
384 l- 33 1 Xx<br />
461 36 - 37 1 X<br />
575 38- 40 1 X<br />
766 42 - 43 1 X<br />
342 46- 72 1 XxXx<br />
333 l- 43 3 . . .<br />
362 46- 63 3 . . .<br />
401 64 - 71 3 . .<br />
365 73 - 84 3 . .<br />
469 73- 76 1 X<br />
Figure 2. Example MODTYP Graph of All Final Stages in TPATH Sequence (AFS 45’1x7)<br />
84<br />
.
CORCAS - The CORCAS report characterizes task clusters selected by the analyst for<br />
further evaluation in terms of the people who most perform it, and especially those principal<br />
performers whose jobs are concentrated in this task cluster to the exclusion of all or most<br />
other task clusters. The CORCAS report may contain any type of background variable<br />
information describing a case that will fit in the allocated space, just as on a PRTVAR<br />
report; however, “base of assignment” and “job title” are often the most useful variables. An<br />
example is shown in Figure 3 below.<br />
CO RCAS CORE CASES FOR TASK MODULES<br />
Summary Statistics for Target Module ST0046<br />
CSOOOI Slap 41: PSOOOl 435 to 437<br />
Page 82<br />
Description Value Description Value<br />
Number of tasks in target module 3 Average number of tasks performed by all cases .OS<br />
Number of core cases in target module 11 Average number of tasks performed by core cases 1.82<br />
Percent of module time covered by core cases ,~ 70.11 Average percent time spent in module by all cases .OS<br />
Core case homogeneity index (CCMI) 35.48 Avcragc percent time spent in module by core cases 1.29<br />
Co-performance Task Title<br />
22.81 G 208 Evaluate water survival performances of students not wearing pressure suit assemblies<br />
18.40 K 302 Perform minor repairs of life rafts, such as patching or replacing spray shields<br />
27.88 K 319 Store life rafts<br />
Case Level Statistics for Target Module STOW5 (n = 3)<br />
Core Cases Sorted on Average Task Importance Values<br />
Sorted<br />
Avcrap Number Percent Percent<br />
Number Task of Core of Tasks Time in Performance<br />
KPATII Grade DAI’SC Supvsd. Base Job Title Importance Modules Performed Module Emphasis<br />
146 Es 91150 05 Mather NCOIC Admin 78.52 9 100.00 1.99 117.24<br />
41 I3 91150 ocl Mather Arspc Physlgy 59.71 15 66.67 1.30 67.49<br />
40 I3 91130 02 Mather Ar. Phy. Spec 56.38 7 66.67 1.65 58.84<br />
13 I35 91170 01 Brooks Supv Aero Phy 53.47 26 100.00 54 54.08<br />
202 E3 91150 00 Mather Arspc Phy Spec 52.78 5 66.67 1.22 19.51<br />
139 Es 91150 04 Mather Asst NCOIC Acad. 42.52 9 66.67 1.40 40.07<br />
Figure 3. Example CORCAS Report Showing Types of Data Which Can Be Displayed<br />
This example illustrates how the program can be useful in interpreting task clusters; in this<br />
case note that almost all cases are individuals assigned to Mather AFB, CA, where the Air<br />
Force conducts its navigator training, By assessing these data in conjunction with the three<br />
tasks in the module, an analyst can begin to make sense out of the tentative task module.<br />
The CORCAS report makes it apparent that the three tasks are co-performed because they<br />
are all a part of the navigator training course at Mather AFB. Note also that the KPATH<br />
number for each case is also shown; this means that by crossmapping KPATH sequences and<br />
analyst-assigned job type names, we could also display the job type for each member (but<br />
would have to sacrifice some other data in order to have room in the display). We have<br />
done this experimentally and found it very useful; in some cases, it leads the analyst to<br />
reconsider the job type names initially assigned.<br />
CASSET - Whereas CORCAS characterizes a task cluster (module) in terms of those cases<br />
whose jobs are most representative of the task module, the CASSET program generates<br />
displays of cases whose jobs are most representative of the job types (group of cases) within<br />
a given set of job clusters. This approach permits an analyst to quickly characterize a job<br />
85
type by the salient features of its most representative and discriminating members. Like<br />
CORCAS, the CASSET report may contain any type of background variable information<br />
describing a case that will fit in the allocated space, just as on a PRTVAR report, with “base<br />
of assignment” and “job title” often being the most useful variables to aid analysts’<br />
interpretations.<br />
JOBMOD - The JOBMOD (Job Type versus Task Module mapping) program aggregates<br />
the case- and task-level indices computed by the four advanced analysis programs and uses<br />
these aggregate measures to relate task clusters to job types and vice versa. The description<br />
of job types by a handful of discriminant clusters of tasks, and the association of each task<br />
cluster with the types of jobs of which it is an important component, is a basic requirement<br />
for defining and integrating the MPT components of an existing or potential Air’ Force<br />
specialty or weapons system. If AFSs are to be collapsed or shredded out, or new jobs are<br />
to be assigned to an occupational area, or old jobs are to be moved to another occupational<br />
area, such highly summarized, yet meaningfully discriminant hard data are essential (Phalen,<br />
Staley, & Mitchell, 1989:4-5).<br />
Within a specialty being studied, a JOBMOD printout is generated for each job group<br />
showing the relationships of the set of task modules to the cases representative of the job.<br />
An example of such a printout is given below:<br />
JOBMOD ANALYSIS OF TASK MODULES WITHIN A JOB GROUP<br />
ST0035 Centrifuge Operators (n= 5)<br />
01 = Number of tasks in module<br />
02 = Average Percent Members Performing (PMP) within group performing task within the nmdute<br />
03 = Average sum of Percent Time Spent (PTS) for group performing tasks within the module<br />
04 = Percent of most time-consuming task s’time covered by tasks in module<br />
OS = Percent of tasks in module which are core tasks for the group<br />
06 = Percent of the group’s core tasks which are in the module<br />
07 = Percent of tasks in module which are discriminating or unique for the group<br />
08 = Percent of group’s discriminating or unique tasks which are in the module<br />
Module Description 01 02 03 04 05<br />
GPO001 Hypobaric Chamber Operations 28 19.29 8.28 15.02 .OO<br />
GPO002 Classroom Instruction 18 3.33 1.24 2.84 .OO<br />
GPO003 Emergency Escape & Survival 12 1.67 .11 .31 .oo<br />
GPO004 Parachute Familiarization 20 .oo .oo .oo .oo<br />
06 07<br />
.oo .oo<br />
.oo .oo<br />
.oo .oo<br />
.oo .oo<br />
GPO022 Centrifuge Operations 22 69.09 42.58 87.88 54.55 66.67 100.00<br />
GPO023 Research Chamber Operations 42 8.57 6.69 9.88 2.38 5.56 33.33<br />
GPO024 TU-103 Training 6 .oo .oo .oo .oo .oo .OO<br />
Page 17<br />
GPO035 General Tasks 3 33.33 1.10 10.40 33.33 5.56 -00 .OO<br />
Figure 4. Example JOBMOD Report Showing Summary Relationships of Task Mod&s to a Job Group<br />
08<br />
.oo<br />
.oo<br />
.oo<br />
.oo<br />
13.10<br />
8.33<br />
.oo
Discussion<br />
The advanced analysis assistance programs outlined here represent a substantial advance<br />
in the automation of COD@ analysis, aimed at permitting the occupational analyst to focus<br />
attention on making critical judgments, rather than spending hours and hours examining<br />
various case data or task data summaries in an attempt to develop an overall perspective on<br />
a specialty or occupational area. By using somewhat standardized displays which focus on<br />
possible job types or task clusters (modules) and defining relationships within and between<br />
given sets of jobs or modules, these programs permit an analyst to quickly decide what the<br />
potentially meaningful clusters are, and to proceed with other aspects of the analysis.<br />
There still remains some work to be done in terms of polishing the three still-experimental<br />
programs. After they are refined and finalized through additional operational testing, they<br />
will be released into the operational ASCII CODAP system, and will become available for<br />
implementation in military occupational analysis programs. Suggestions for additional<br />
analysis assistance programs which might be needed and useful are also welcome.<br />
References<br />
Christal, R.E. (1974). The United States Air Force occunational research proiect (AFHRL-TR-73-75, AD-774<br />
574). Lackland AFB, TX: Occupational Research Division, Air Force Human Resources Laboratory.<br />
Christal, R.E., & Wcissmuller, J.J. (1988). Job-task inventory analysis. In S. Gael (Ed), Job analvsis handbook<br />
for business, industrv, and government. New York: John Wiley and Sons, Inc. (Chapter 9.3).<br />
Driskill, W. E., Mitchell, J.L., & Tartell, J.E. (1980, October). The Air Force occupational analysis program -<br />
a changing technology. Proceedings of the 22nd Annual Conference of the Militarv <strong>Testing</strong> <strong>Association</strong>. Toronto,<br />
Ontario, Canada: Canadian Forces Personnel Applied Rescarch Unit.<br />
Haynes, W.R. (1989, January). JOB-TYPING, Job-typing programs. In: Comnrehensive Occunational Data<br />
Analvsis Proerams. San Antonio, TX: Analytic Systems Group, The MAXIMA Corporation, Prepared for the<br />
Air Force Human Resources Laboratory [program documentation available on the AFHRL Unisys computer].<br />
Mitchell, J.L. (1988). History of job analysis in military organizations. In S. Gael (Ed), Job analysis handbook<br />
for business. industrv. and government. New York: John Wiley and Sons, Inc. (Chapter 1.3).<br />
Mitchell, J.L., Phalen, W.J., Haynes, W.R., & Hand, D.K. (1989, October). Operational testing of ASCII CODAP<br />
job and task clustering methodologieS (AFHRL-TP-88-74). Brooks AFB, TX: Manpower and Personnel Division,<br />
Air Force Human Resources Laboratory.<br />
Morsh, J.E. (1964). Job analysis in the United States Air Force. Personnel Psvcholog, 17, 7-17.<br />
Phalcn, W.J., Staley, M.R., Sr Mitchell, J.L. (1987, May). New ASCII CODAP programs and products for<br />
interpreting hierarchical and nonhierarchical clusters. Proceedines of the Sixth <strong>International</strong> Occunational<br />
Analvsts’ Workshop. San Antonio, TX: USAF Occupational Measurcmcnt Center.<br />
Phalen, W:J., Staley, M.R., & Mitchcil, J.L. (1988, Dcccmber). ASCII CODAP programs for selecting and<br />
interpreting job and task clusters. Proceedings oft he 30th Annual Conference of the Militarv Testine <strong>Association</strong>,<br />
Arlington, Virginia: U.S. Army Research Institute.<br />
Weissmuller, J.J., Tartell, J.E., & Phalen, W.J. (1988, December). Introduction to operational ASCII CODAP:<br />
An overview. Proceedings of the 30th Annual Conference of the Militarv <strong>Testing</strong> <strong>Association</strong>. Arlington, VA:<br />
U.S.Army Research Institute.<br />
87
-1-b---- .--. -__<br />
----. .~<br />
PROFESSIONEL SUCCESS OF FORMER OFFICERS IN CIVILIAN OCCUPATIONS<br />
Paul Klein<br />
Studying at Federal Armed Forces Universities<br />
Owing to the fact that the recruitment of personnel for military service was<br />
becoming increasingly difficult, the Federal Minister of Defense set up ,a commission<br />
to reorganize education and training in the Federal Armed Forces. In<br />
mid-1971, the commission presented a report suggesting, among other things, the<br />
reorganization of education and training for officers. In doing so, the conunission<br />
proceeded on the assumption that only by providing a system of education<br />
and training which - besides military requirements - also “considers to an increasing<br />
extent the soldiers’ individual interests regarding further education<br />
could we expect the Federal Armed Forces to become more attractive for volunteers,<br />
resulting in an increasing number of applicants” (Lippert/Zabel 1977,<br />
page 52).<br />
With respect to officer education and training, this statement consequently led<br />
up to the introduction of an academic course of studies as part of the officer<br />
education program. On the one.hand, this course of study was to facilitate the<br />
transition to civilian occupations for temporary-career volunteers after leaving<br />
the armed forces, thus making this type of career again more attractive for<br />
volunteers. On the other hand, it was also expected to be of benefit to all<br />
officers in the course of their service, in particular to regular officers with<br />
staff assignments, as the commission assumed that “the functions of officers in<br />
the fields of leadership, organization, training, and their responsibilities<br />
towards their subordinates today make different demands on them than they did in<br />
the past, and that these demands can hardly be met by a system of officer education<br />
and training which emphasizes a rather practical approach, i.e., passing<br />
on experience previously gained” (Ellwein et. al. 1974, page 12). Finally, the<br />
course of studies was to provide an alternative for regular officers who - for<br />
whatever reasons - might decide to correct their original choice of occupation.<br />
For pragmatical, economical, and academical reasons the commission suggested<br />
that the Federal Armed Forces should establish their own universities. Let tures<br />
at these universities commenced on 1 October 1973 in Munich and Hamburg.<br />
For all officers with an extended period of enlistment, studying at one of the<br />
two Federal Armed Forces universities is an obligatory part of their education.<br />
An officer has three and a half years to complete his course of study. To make<br />
maximum use of this study period, which is rather short as compared with courses<br />
at civilian universities, studies are based on the trimester system.<br />
When the universities opened in 1973, the courses offered in Hamburg and Munich<br />
included mechanical engineering, electronics , economics and managerial science<br />
as well as pedagogics, with additional courses provided in Munich in the f ieids<br />
of aerospace engineering, civil engineering including geodesy, and computer<br />
science. Additional courses have been added in the meantime.<br />
88
From a technical point of view, there are no major differences between the<br />
courses of studies provided at the two Federal Armed Forces universities and the<br />
corresponding courses provided at civilian universities. Just like there,<br />
studies are completed by taking the diploma examination. Students who pass the<br />
examination successfully are conferred an academic degree, such as “Diplomingenieur”<br />
.<br />
Concept and Conduct of the Study-<br />
In 1983 the Federal Minister of Defense assigned to the Federal Armed Forces<br />
Institute of Social Sciences the task of conducting a study on the opportunities<br />
and problems involved in the transition of officers with extended periods of<br />
enlistment to the civilian working life. The study was to consider both officers<br />
who completed a course of studies at the Federal Armed Forces universities and<br />
temporary-career volunteers with an extended period of enlistment who left the<br />
Federal Armed Forces at the end of their service without an academic education,<br />
as well as jet pilots and weapons system operators who retired from service when<br />
reaching the age of 41.<br />
The study was designed as what is called a panel survey, i.e.; all the officers<br />
with an extended period of enlistment who left the armed forces in 1984 and 1985<br />
were questioned for the first time when retiring from active service, and a<br />
second time three and a half years later using a standardized questionnaire. The<br />
results given now in this presentation have been obtained from the second survey<br />
among the retired officers. This second survey was conducted in late 1987<br />
through early 1989 and included almost 60 % of all temporary-career volunteers<br />
with a twelve years period of enlistment, as well as the pilots who left the<br />
Federal Armed Forces in 1984 and 1985.<br />
Results<br />
If taking for granted that the results obtained are representative of officers<br />
with an extended period of enlistment who have retired from active service, we<br />
may say that those who graduated from a Federal Armed Forces university have<br />
managed to become integrated into the civilian working life.<br />
. Using the answers obtained from the officers as a basis, we may permit ourselves<br />
to state that the majority of university graduates had no major difficulties in<br />
adapting themselves to the demands made in the civilian sphere, and that they<br />
were successful in their subsequent civilian careers. Except for pedagogics it<br />
is obvious that all courses of studies provided at the Federal Armed Forces universities<br />
may well be said to pave the way also for a civilian career.<br />
Owing to the fact that the situation on the labor market was extremely unfavorable,<br />
it was difficult for those who graduated in pedagogics to find a civilian<br />
occupation closely related to their field of studies in the Federal Republic<br />
of Germany. The fact that they nevertheless managed to find occupations, even<br />
though many times outside the field of pedagogics, testifies to the flexibility<br />
and adaptability of these officers.<br />
89
Some three and a half years after leaving the Federal Armed Forces 80.8 % of the<br />
420 university graduates who had been questioned were employed, 1.9 % were<br />
trainees, and 0.2 % were out of work. Of those employed, 72.1 % worked for private<br />
enterprises and 18.6 % had joined the civil service. 7.7 % had set up on<br />
their own by that time, or worked freelance.<br />
There were clear differences with regard to satisfaction on the job, the income<br />
situation, and career prospects between officers who had decided to work for a<br />
private enterprise and those who had decided in favor of the civil service.<br />
Generally speaking, it may be said that those who chose the “safe” way of a<br />
civil service career - possibly because thinking along the lines of job security<br />
and shying away from taking risks - had to pay the price by having to put up<br />
with limited perspectives regarding income and promotion.<br />
Since in the Federal Republic of Germany the pay grades of officers are comparable<br />
to the pay grades of’other civil service members we were able to find<br />
out that the majority of the officers who decided in favor of a civil service<br />
career had not achieved a higher-ranking position as compared with the last<br />
position they held in their military career. Correspondingly, the same may be<br />
said of their financial situation.<br />
Those questioned who had opted for private enterprise revealed a quite different<br />
development. A mere 7.1 % of them stated that they earned less now than they had<br />
in their last assignment in the armed fores, as opposed to 80.0 % who said that<br />
their income had increased slightly or even considerably. Particularly those<br />
working in the field of engineering regarded their financial position to be<br />
quite favorable. More than 75 % of them pointed out that their salary was now<br />
considerably higher than the pay they had received as officers. ~11 of the<br />
computer scientists who were questioned said that their income had increased<br />
considerably.<br />
Of the university graduates questioned, 84.4 % were largely content with their<br />
civilian occupations and career prospects. Their expectations, so they said, had<br />
been met. 12.3 % said their satisfaction was limited and their expectations had<br />
not come true in many cases. Among those who spoke of limited satisfaction and<br />
disappointment on the job was a relatively large number of those who had studied<br />
pedagogics but also two of the seven computer scientists. Typically enough, only<br />
a few of the “disappointed” ones were employed with private enterprises; most of<br />
them had joined the civil service, with many of them using a way of access which<br />
the legislature primarily provided for non-commissioned officers retired from<br />
active service. (See Table 1.)<br />
The positive assessment of military service resulted in more than half of the<br />
graduates saying that they would go the same way again, if they had to decide<br />
once more. (See Table 2. )<br />
Dropouts and temporary-career volunteers or regular officers without a university<br />
education had to, cope with a much more difficult transition to a civilian<br />
occupation than had their graduate counterparts. Since, as a rule, they lacked<br />
training in a civilian occupation only some of them managed to find adequate<br />
civilian employment immediately upon leaving the armed forces. only 25 % of the<br />
dropouts questioned and merely 10 % of the temporary-career volunteers and regular<br />
officers without a university education found some civilian employment<br />
immediately after leaving the armed forces. The considerable number who did not<br />
have to undergo some sort of vocational training, either as in-plant trainees or<br />
- . _.....v ..,_. ._ .__^,<br />
90<br />
.
y attending schools, to meet the requirements for employment in the civilian<br />
sphere. This was not easy for them, in particular if long-term training was<br />
required. In those cases, financial bottlenecks and austerity became almost<br />
inevitable attendant circumstances, the more so since in not only a few cases<br />
they had to go through a period of unemployment - even if only brief in most<br />
cases - before starting their training period.<br />
Assessment<br />
I am very content. Without any exceptions<br />
my expectations and hopes have been<br />
fulfilled.<br />
I am content. My expectations and hopes<br />
have mainly been fulfilled.<br />
I am not content. Many of my expectations<br />
and hopes have not been fulfilled.<br />
I am very discontent. None of my expecta-<br />
tions and hopes have been fulfilled.<br />
I cannot answer the question.<br />
Table 1<br />
Overall Assessment of <strong>Military</strong> Service<br />
Made by Officers Who Graduated From University<br />
Pedagogics Economics<br />
2<br />
Number (%) of Answers Received<br />
’<br />
Engineering Computer<br />
Subjects Science<br />
( 3.6) (2:61 1<br />
40 141 3<br />
(71.4) (82.4) (42.9)<br />
11<br />
13<br />
(19.6) ( 7.5) (28!6,<br />
-<br />
3<br />
( 5.4) ( 440) ( 243,<br />
- -<br />
Number 56 100 171 7<br />
Options for Answering<br />
I would enlist again for the same number<br />
of years.<br />
I would enlist again for a shorter period<br />
of time.<br />
I would not join the Federal Armed Forces<br />
again under any circumstances.<br />
I have not thought about this yet.<br />
OR: I cannot say yet.<br />
Number<br />
Table 2<br />
The Inclination of Officers Graduated From University<br />
Towards (Hypothetical) First Enlistment<br />
Pedagogics<br />
(2:!8,<br />
3<br />
Number 1%) of Answers Received<br />
Economics and<br />
Managerial SC.<br />
18 20 -<br />
(18.0) (11.6)<br />
12 17<br />
(12.0) ( 9.9)<br />
-<br />
!<br />
!<br />
I<br />
I<br />
3 i<br />
(50.01 /<br />
( 5.5)<br />
55 100 171 6<br />
I<br />
---j<br />
j<br />
I<br />
91
-.. v..-.----_---..--_.__.-_<br />
-_yy ..-.... - ..~.<br />
Without training in a civilian’occupation the chances on the labor market were<br />
small for dropouts and temporary-career volunteers without an academic education.<br />
Those who underwent vocational training had no major difficulties in<br />
being integrated afterwards. The question as to whether this is also tr;Erczf<br />
officers who took up academic studies only after they had left the armed<br />
cannot be answered at present, as they have not yet completed the respective<br />
courses of studies.<br />
Owing to the difficulties experienced in the transition to civilian life, dropouts<br />
assessed the time they spent in the military less favorably than did university<br />
graduates: however, temporary-career officers without an academic education<br />
seemed to be quite content with their military service.<br />
Assessment<br />
Table 3<br />
Overall Assessment of <strong>Military</strong> Service<br />
Made By Dropouts and Temporary-Career Volunteers Without University Education<br />
I am very content. Without any exceptions<br />
my expectations and hopes have been<br />
fulfilled.<br />
I am content. My expectations and hopes<br />
have mainly been fulfilled.<br />
I am not content. Many of my expectations<br />
and hopes have not been fulfilled.<br />
I am very discontent. None of my expectations<br />
and hopes have been fulfilled.<br />
I cannot answer the question.<br />
Number<br />
Temporary-Career Volunteers<br />
w/o Univ. Education<br />
3 ( 3.4)<br />
(’ 75.0)<br />
17.0)<br />
2 ( 2.3)<br />
2 ( 2.3)<br />
88<br />
Number 1%) of Answers<br />
-<br />
Dropouts<br />
2 ( 3.4)<br />
28 (47.5)<br />
15 (25.4)<br />
14 (23.7)<br />
Among pilots there was a relatively high degree of discontent with regard to the<br />
civilian occupations they held three and a half years after retiring from active<br />
duty. This was primarily owing to the fact that these officers had attained high<br />
ranks before leaving the armed forces and had based their expectations on what<br />
they had achieved. If these expectations should remain unchanged it will be very<br />
difficult to remedy their situation. This applies in particular to those officers<br />
who not only expect their civilian salary to correspond to the rank they<br />
had attained (in most cases lieutenant colonel) but also try to continue their<br />
career as civilian pilots.<br />
59<br />
I<br />
/<br />
)<br />
i<br />
,<br />
I<br />
I<br />
-
Literature<br />
------. .-- -.~---.. -- .-. .--. .-- -. -__ -.-<br />
_ _ .<br />
- ..-. WC<br />
Bildungskommission beim Bundesminister der Verteidigung (Ed.): Neuordnung der<br />
Ausbildung und Bildung in der Bundeswehr, Bonn 1971.<br />
Ellwein, Th./Miiller, A.v./Plander, H. (Ed.): Hochschule der Bundeswehr zwischen<br />
Ausbildungs- und Hochschulreform, Opladen 1974.<br />
Hitpass, J./Mock, A.: Das Image der Universitbten, Diisseldorf 1972.<br />
Klein, P.: Der irbergang ldngerdienender Zeitoffiziere in das zivile Berufsleben,<br />
Miinchen 19 84.<br />
Klein, P.: Die Bewdhrung ehemaliger Offiziere der Bundeswehr im Zivilberuf,<br />
Miinchen 1987. i<br />
Klein, P.: Truppendiensttauglich? Zur Bewdhrung von Absolventen der Bundeswehruniversitaten<br />
in der Truppe, in: W.R. Vogt (Ed. 1 : Militar als Lebenswelt,<br />
Leverkusen 1988, S. 241-250.<br />
Lippert, E./Zabel, R.: Bildungsreform und Offizierkorps, in: Sozialwissenschaftliches<br />
Institut der Bundeswehr, Berichte H.3, Miinchen 1977, S. 49-156.<br />
93
A MILITARY OCCUPATIONAL SPECIALTY (MOS)<br />
RESEARCH AND DEVELOPMENT PROGRAM: GOALS AND STATUS<br />
Dorothy L. Finley and William J. York, Jr.<br />
U.S. Army Research Institute Field Unit<br />
Fort Gordon, Georgia<br />
Threat, force modernization, doctrine, and force structure<br />
often change in ways which influence what is required with .<br />
respect to soldier performance. Responses to changes in soldier<br />
performance requirements to assure adequate operation and<br />
maintenance of the Army's inventory of systems often include<br />
changes in MOS and CMF designs. These changes are, in this<br />
program, called MOS restructuring and is the focus of the<br />
program. MOS restructuring is defined as the addition or<br />
deletion of tasks to an existing MOS, the merger or deletion of<br />
MOSS, or the assignment of tasks to a new MOS.<br />
The Army is faced with enlarging and more varied inventories<br />
of equipment (older equipments often cannot be disposed of due to<br />
the insufficient numbers of new equipments), reduced manpower<br />
ceilings, and a reduced and changing manpower pool. The<br />
decisions made about MOS and Career Management Field (CMF)<br />
restructuring determine the number of soldiers in units versus in<br />
training (given manpower ceilings), the number of operators and<br />
maintainers needed to staff the equipments, the design of the<br />
training system, and the levels of aptitudes required. Analyses<br />
have demonstrated that what appears to be the best MOS<br />
restructuring option with respect to one of these factors may be<br />
a very bad option with respect to the other factors. The goal of<br />
this program is to develop decision aids to facilitate the<br />
identification of optimal, not suboptimal, MOS restructuring<br />
solutions with respect to manpower, personnel, and training<br />
resource considerations, and the requirements for unit<br />
performance.<br />
There are several considerations and constraints involved in<br />
any action to restructure MOSS. A fundamental concern is task<br />
and equipment commonalities and differences. One does not want<br />
to assign a set of tasks to a soldier which are so different and<br />
numerous as to impose a too large training requirement or<br />
require too high a level of too many different aptitudes. One<br />
does, on the other hand, want to assign a sufficiently large<br />
number of tasks such the soldier will be fully employed and can<br />
be flexibly assigned. This concern must be considered within the<br />
Contexts of both requirements and constraints. Requirements<br />
include such items as aptitude and gender job requirements,<br />
manpower utilization and training requirements, and the need for<br />
career progression opportunities. The constraints include such<br />
items as manpower pool characteristics and size, manpower<br />
ceilings, available training resources, geographical and
organizational distribution of the equipments, and the size of<br />
the MOS and relative percentages of soldiers across the grade<br />
levels. Overall, MOS restructuring can be summarized as a<br />
complex, multi-dimensional decision. The considerations, and<br />
constraints versus requirements, relate to at least training<br />
impacts, personnel characteristics, force structure, equipment<br />
design, personnel resources, manpower resources, and task<br />
structure.<br />
As noted above, the program objective is to develop aids to<br />
facilitate MOS and CMF restructuring decisions regarding such<br />
questions as: Is restructuring needed at all? Should a new MOS<br />
be created? Should MOSS be merged? Is an overall redesign of<br />
the branch MOSS and CMFs needed? Whatever is done impacts<br />
directly on the branch training system design. The addition or<br />
deletion of tasks which require training imposes a requirement to<br />
modify the training system to accomodate those changes.<br />
Program Overview<br />
The current formulation of the program is presented Figure<br />
1. Work has been accomplished or is projected for the near<br />
future on: The Army Authorization Domentation System (TAADS) (a<br />
manpower data base), and personnel and training data bases; the<br />
ability, equipment, and task domains; and trade-off algorithms.<br />
Recent.accomplishments with respect to the TAADS data base, and<br />
the ability and equipment domains will be described in the next<br />
section.<br />
As depicted in Figure 1, the intent is to provide the<br />
analyst with the tools needed to identify desireable MOS<br />
restructuring possibilities, and to consider these withing<br />
manpower, personnel, and training resource constraints: and then<br />
to provide the means to do tradeoffs between the alternatives<br />
with respect to manpower, personnel, and training impacts. In<br />
Figure 1, under "Trade-Off Algorithms", both operations-based and<br />
requirements-based are noted. Operations-based analyses are<br />
those performed, in the Army, by the Personnel Proponent as the<br />
basis for preparing the paperwork which will actually cause a MOS<br />
restructure action to be implemented. These analyses are<br />
sometimes triggered by the outcomes of requirements-based<br />
analyses. Requirements-based analyses often take place when<br />
there is a major change in equipment inventories, doctrine,<br />
organization, or force structure. These requirements-based<br />
analyses tend to be performed by the combat developers in<br />
coordination with the training developers and personnel<br />
proponents.<br />
95
TAADS and PMAD Data Base<br />
Personnel and Training Data Base<br />
Ability Domains<br />
Equipment Domains<br />
Task Domains<br />
\<br />
Cunent and Pfojected Position &<br />
Manpower, Personne/, and Tmhing Resources<br />
Operations-Based<br />
Optimum<br />
+ Manpower,<br />
Personnel,<br />
and Training<br />
Alternatives<br />
Fisure 1. Overview of the MOS restructuring program to develop<br />
decision aids.<br />
Ecuinment Domains<br />
Recent Accomplishments<br />
Equipment domains are defined as groupings of equipments<br />
based on their similarities with respect to equipment<br />
descriptors. Human factors specialists dealing with the<br />
development of new systems have always defined tasks, ability<br />
requirements, etc. in terms of the design of that new item of<br />
equipment. Many MOSS, however, deal with many systems and, when<br />
a new item of equipment is entered into the inventory then one<br />
must consider.inventory groupings in making MOS assignment or<br />
restructuring decisions. The identification of appropriate<br />
descriptors began as a part of assisting the Signal Branch<br />
Personnel Proponent in developing a training strategy to Support<br />
the merger of three MOSS with two of them becoming an Additional<br />
Skill Identifier (ASI) to the merged MOS. After investigation it<br />
was determined that, in terms of equipment commonalities, only<br />
one of the MOSS should be assigned an ASI. This finding resulted<br />
96
in training cost savings, a reduced training attrition rate,<br />
improved position fill capability, and increased potential to the<br />
soldiers for promotion. Drawing upon this research, an initial<br />
Equipment Domains Assessment Procedure (EDAP) has been developed<br />
which shows promise for identifying equipment domains appropriate<br />
for operators. Identifying equipment domains appropriate for<br />
maintainers is a more complex problem and will take further<br />
research.<br />
Ability Domains<br />
The Army Research Institute Fort Huachuca Field Unit has.<br />
refined the Job Abilities Assessment System (JAAS) and added a<br />
part C, in addition to existing parts A and B, specific to<br />
military intelligence., Mr. York will present a paper at this<br />
meeting describing the application of the refined JAAS, parts A<br />
and B, to MOSS in the Signal Branch. I am going to describe<br />
their application to intelligence MOSS to derive ability<br />
requirements profiles that can be compared to assess the<br />
reasonableness of MOS assignment to a new system. It is of<br />
interest to us because the profiles are derived through analysis<br />
of the tasks assigned to the soldier and, therefore, provides a<br />
means of appraising whether the restructuring proposed, i.e., the<br />
reassignment of tasks to MOSS, creates too great a demand on<br />
ability requirements.<br />
JAAS consists of a taxonomy of 50 abilities (e.g., dynamic<br />
strength, written expression) which, for presentation purposes,<br />
are often grouped into eight clusters (e.g., gross motor skills,<br />
communication skills) and a set of procedures for making scalar<br />
judgments regarding the level of each of the 50 abilities<br />
required to perform a set of tasks. This technique was used to<br />
develop ability profiles for several intelligence MOSS and to<br />
appraise the ability requirements for a new intelligence system.<br />
It was determined that some of the intelligence MOS ability<br />
requirements profiles were distinctly different. It was further<br />
determined that the particular MOS selected to perform operations<br />
and control tasks on the new system was a good choice in that the<br />
ability requirements profile for the MOS closely matched the<br />
ability requirements profile for those tasks on the new system.<br />
TAADS Data Base<br />
Up to 60% of the effort required on the part of the<br />
Personnel Proponent to prepare a MOS restructuring action is<br />
devoted to position data analysis. Position data analysis an<br />
analysis of the TAADS and Personnel Management Authorization Data<br />
(PMAD) data bases for each of the impacted MOSS. These contain<br />
detailed information on each.MOS position currently authorized<br />
(TUDS) and projected (PMAD). This is largely performed manually<br />
and, hence, very time consuming and error prone. There are<br />
criteria as to appropriate grade structure, etc., and it is<br />
essentially a "zero sum game". The TAADS and PMAD constitute the<br />
constrained manpower data base at a MOS position by position<br />
97
level with a great deal of associated information.<br />
A Position Data Analysis Job Aid-l (PDAT-JA-1) software<br />
TAADS analysis tool has been developed which will be installed at<br />
the first Personnel Proponent office (at the Signal Branch) in<br />
December. It automates manipulation of the TAADS data base and<br />
provides analysis tools. A place holder is in the program for<br />
the PMAD data base when it becomes available in the form needed<br />
for our purposes. The PDAT-JA-1 outputs are:<br />
* Quantitative summaries of MOS authorization for each<br />
grade level by: grand total, ASI, Skill Qualification Identifier _,<br />
(SQI), major command (MACOM), tables of organization and<br />
equipment (TOE), tables of distribution and allowances (TDA),<br />
continental United States (CONUS), and outside CONUS (OCONUS);<br />
* Deviations from the Average Grade Distribution matrix:<br />
Deviations from criteria regarding:<br />
(SIMO:), gender, ASIs, and SQIs;<br />
space imbalanced MOS<br />
* Development of an acceptable grade structure; and<br />
* (When PMAD program implemented) Identification of TAADS<br />
and PMAD mismatches by unit identification code and grade.<br />
The development of an acceptable grade structure is enabled<br />
through the provision of work sheets which allow the analyst to<br />
create modified TAADSs (the original data base always remains<br />
intact). Each modified TAADS can then run to produce the first<br />
three outputs above until an acceptable grade structure is<br />
realized.<br />
98
APPLICATION OF THE JOB ABILITY ASSESSMENT SYSTEM TO<br />
COMMUNICATION SYSTEM OPERATORS<br />
WILLIAM J. YORK, JR. and DOROTHY L. FINLEY<br />
U.S. ARMY RESEARCH INSTITUTE FIELD UNIT<br />
FORT GORDON, GEORGIA<br />
As the Army introduces major new equipment into its<br />
inventory,there is a need to restructure <strong>Military</strong> Occupational<br />
Specialties (MOS) and to reclassify soldiers from old MOSS into<br />
new MOSS. Identification and quantification of specific soldier<br />
abilities required to perform in new a MOS would enhance both the<br />
training development and personnel management decision process<br />
associated with major reclassification actions. A method for<br />
mapping soldier abilities requirements from old to new MOSS would<br />
provide Army managers with a useful tool in the areas of force<br />
structure design and personnel or job classification.<br />
In support of this, the AR1 Fort Gordon Field Unit is<br />
conducting research using the JAAS methodology developed by<br />
Fleishman to determine if significant differences in terms of the<br />
JAAS abilities exist among Signal MOSS and if unique ability<br />
patterns are significant enough to support mapping from old MOSS<br />
to new MOSS. Moreover, we hope to identify a group of abilities<br />
that could be measured by existing tests. This effort, to digress<br />
for a minute, supports a need to determine how best to reclassify<br />
soldiers from several existing Signal MOSS into two new MOSS.<br />
These two new MOSS support a recently introduced area<br />
communication system that is to replace the majority of the<br />
current division and Corps Signal equipment and structure.<br />
Reclassification and training of current MOSS holders to perform<br />
in the new MOSS is a critical issue. The feasibility of using the<br />
JAAS methodology to determine which new MOS is most similar to<br />
existing MOSS is the primary research goal.<br />
Our initial effort focused on existing communication MOSS.<br />
Using the JAAS abilities shown in figure 1 (pg.4) and the ability<br />
description and scale shown in figure 2, (pg.4) two groups of<br />
subject matter experts (SMES) rated four Signal operator MOSS -<br />
31C, 31M, 31L and 72E. Group A, consisting of seven senior<br />
personnel, rated all four MOSS. Group B, Consisting Of nine t0<br />
eleven MOS SMEls, rated only their MOS. Mean scores by ability<br />
and ability cluster were calculated for each MOS. Interrater<br />
reliability was determined by applying Kendall's Coefficient of<br />
Concordance to the rank-ordering of the eight ability clusters.<br />
As shown in table 1, (pg.2) rater agreement varied significantly<br />
among the four Moss, as well as, between the two groups of<br />
raters. Figures 3 and 4 (pg.5) depict the difference in profiles<br />
between the two groups. Table 2 shows examples Of actual ratings<br />
for two MOSs- the best and the worst- in terms of rater<br />
agreement.<br />
99
72E<br />
31c<br />
31L<br />
31M<br />
31M COMM<br />
RATER 1 1<br />
RATER 2 7<br />
RATER 3 1<br />
RATER 4 8<br />
RATER 5 1<br />
RATER 6 8<br />
RATER 7 5<br />
FINAL 0.090452<br />
31c COMM<br />
RATER 1 1<br />
RATER 2 1<br />
RATER 3 2<br />
RATER 4 2<br />
RATER 5 4<br />
RATER 6 3<br />
RATER 7 1<br />
RATER 8 1<br />
RATER 9 1<br />
RATER 10 3<br />
RATER 11 6<br />
FINAL 0.445887<br />
GROUP A<br />
0.324852<br />
0.333605<br />
0.242180<br />
0.090452<br />
TABLE 1<br />
TABLE 2<br />
CON<br />
3<br />
3<br />
5<br />
4<br />
3 f.<br />
7<br />
6<br />
REA<br />
2<br />
5<br />
3<br />
2<br />
4<br />
5<br />
7<br />
SPLD<br />
7<br />
8<br />
1<br />
1<br />
6<br />
4<br />
2<br />
PER-V<br />
8<br />
1<br />
7<br />
5<br />
2<br />
1<br />
3<br />
GROUP B<br />
0.343915<br />
0.445887<br />
0.124285<br />
0.286190<br />
PER-A<br />
5<br />
2<br />
6<br />
3<br />
5<br />
6<br />
1<br />
PSY GRMO<br />
4 6<br />
6 . .4<br />
4 8<br />
6 7<br />
8 7<br />
2 3<br />
4 8<br />
CON REA SPLD PER-V PER-A PSY GRMO<br />
6 4 2 5 3 7 8<br />
7 5 3 6 4 2 8<br />
4 8 7 3 5 1 6<br />
4 6 3 5 1 7 8<br />
3 7 8 5 1 2 6<br />
7 8 5 6 1 2 4<br />
6 7 5 3 2 4 8<br />
2 3 4 5 6 7 8<br />
3 5 4 7 2 6 8<br />
4 8 5 7 1 2 6<br />
4 7 3 8 1 2 5<br />
Correlation analyses between each pair of the four MOSS were<br />
conducted using the'MOS mean of each of the 50 abilities. Ratings<br />
for Group A and B had been combined for this analysis. Results<br />
are shown in table 3.<br />
TABLE 3<br />
CORRELATION MATRIX<br />
72E 31c 31L 31M<br />
72E - .7830 .0862 .5782<br />
31c - - .0824 .5527<br />
31L - .1280<br />
31M<br />
Statistical analyses of mean differences between MOSS by ability<br />
and ability cluster have not been conducted, but visual<br />
examination indicate that differences do exists both at the<br />
ability and cluster level as depicted in figures 5 and 6 (pg.5).<br />
100<br />
.
As already shown, rater reliability is poor but may be a<br />
function of the analysis approach. Additional analysis will focus<br />
on the ability level instead of the cluster level. We felt that<br />
the correlation results were highly interesting in that the<br />
degree of relationship between the MOSS pairs is highly<br />
supportive of the relationships subjectively thought to exist.<br />
Moreover, we expect to see even stronger relationships, both<br />
positive and negative, at the ability subset level. For example,<br />
a correlation analysis between each MO8 pair using the abilities<br />
groupings of communications, auditory, psychomotor, and gross<br />
motor should reveal this increase. We believe that this approach<br />
will also be applicable to a comparison analysis between current _.<br />
MOSS and new MOSS.<br />
Our future efforts will focus on several areas. First, we<br />
plan to determine rater agreement for each ability across each<br />
MOS. Using these results in conjunction with analysis of mean<br />
differences between abilities and MOSS we intend to focus on a<br />
reduced set of abilities. This subset will be a function of rater<br />
reliability and discriminate power between MOSS. In other words,<br />
those abilities that have the best rater agreement and tend to<br />
discriminate between Moss will be used. In addition to the<br />
development of a refined subset of abilities, we plan to analyzed<br />
MOSS at the major duties level. Two new MOSs- 31D and 31F- will<br />
be analyzed using the JAAS procedure and then compared with the<br />
total MOS and major duty profiles of the four MOSS already<br />
completed. These two MOSS are the operators for the new area<br />
communications system and are the MOSS into which a significant<br />
number of Signal soldiers will be reclassified. It is hoped that<br />
the profile comparisons will provide objective information for<br />
this reclassification effort.<br />
101<br />
_.
l-‘IguLC I<br />
novised Llat OK nl~llltlsn rind clestors<br />
~0llCL1’TUhlr BKILLSt 5.<br />
6.<br />
7.<br />
8.<br />
9.<br />
FEIICCI’TUAL GKILISI VlSlOlI 2 1 .<br />
2:<br />
:::<br />
. . :::<br />
rRnCErTuAL SKILIS8 hU”ITIOl, ,,.<br />
32.<br />
33.<br />
GllOS3 tlOTOn SKILI.5, II.<br />
42.<br />
43.<br />
44.<br />
45.<br />
Dlcyole 20 mllss to vork<br />
Figure 2<br />
- ‘<br />
- 4<br />
llty OC Closore<br />
(I‘JttPCII lrccogllltlo~l)<br />
selrctlvc httriltloll<br />
R atlo orialltntloll<br />
v T sunllzetloll<br />
102
0<br />
a<br />
103<br />
-t<br />
\~~.1~--l---1.1-~L-I<br />
I
Preferences for <strong>Military</strong> Assignments in German Conscripts<br />
Introduction<br />
K. Arndt<br />
Federal Office of Defense Administration<br />
Bonn, Germany<br />
What German conscripts know about available military assignments is primarily based on<br />
information from friends and acquaintances who have already done their military service. The<br />
media, the military counselor and visits to military units (open day) constitute an additional but<br />
less important source of information. It is generally true to say that knowledge and a general<br />
overview of all the military assignments that are available depend on whether information has<br />
been obtained passively or actively. Preconceived ideas about military activities do often lead<br />
to discrepancies between everyday military life and expectations. Lack of motivation, indifferent<br />
feelings about military service and discontent with the draft procedure are the consequences.<br />
In addition to the need for objective standardized information on military assignments, measures<br />
were required to counteract the negative image to the armed forces, since the willingness of<br />
young men to do military service has continually been decreasing over the past years. Against<br />
the background of a military threat, which was perceived to be real, the majority of those liable<br />
to military service passively agreed to military service, but as early as 1987 this percentage<br />
declined to less than 50 % for the first time. It must be assumed that this development has<br />
continued to date. As a result of this findings, it was decided to develop a transparent and<br />
efficient method designed to provide young men liable to military service with an overview of<br />
the requirements and qualifications for military assignments and a clear picture of job dcscription.<br />
The “Assignments - Interests - List” (AIL) is the result of this development during which variant<br />
models of information transfer and target-orientated representation were prctestcd. The results<br />
of a nation-wide AIL test are reported.<br />
Description of AIL<br />
On the basis of expert rating, the 117 possible assignments for conscripts were reduced to 2.5<br />
representative assignments covering both fighting and non-fighting troops. A brief description<br />
was compiled for each of the selected assignments, including a picture of a typical activity and<br />
an account of the most important requirements and features of the job.<br />
The Assignments-Interests-List comprises the military assignments as described in Table 1. The<br />
AIL method can be used for groups or individuals. Each item is looked at wilhout any time limits.<br />
Pretest results<br />
An initial prctcst was carried out by sampling 105 persons liable to military service. Of the 452<br />
preferences stated, 256 (57 %) fell to assignments with non-fighting troops and 196 (43 %) to<br />
fighting troops. Application of the AIL method produced a marked increase in the number of<br />
desired assignments indicated; without AIL, the average number of assignments considered to<br />
be interesting was 2.6 while use of AIL produced an increase to an avcrapc of 4.4 . The education<br />
level had no ascertainable influence on the preferences expressed. The LC~L time to work through<br />
the test ranged from 6 to 22 minutes.<br />
104
Table 1<br />
<strong>Military</strong> Assignments in the AIL<br />
Fighting Troops Non-Fighting Troops<br />
military policeman<br />
light infantryman<br />
mountain trooper<br />
paratrooper<br />
mechanized infantryman<br />
gunner<br />
missile gunner<br />
gunlayer<br />
engineer<br />
deck hand<br />
signal construction man<br />
signal operating man<br />
radio operator<br />
Sample<br />
clerk<br />
radar operator<br />
teletypist<br />
supplyman<br />
second cook<br />
driver<br />
electronics technician<br />
radiomechanic<br />
!+aircraft mechanic<br />
armament repairman<br />
automotive vehicle mechanic<br />
medical corpsman<br />
.._<br />
_____----- --- -.-. ---. ~_.______ -All<br />
1,225 persons liable to military service were tested by means of AIL during the psychological<br />
qualification and placement test before they were drafted into military service. The composition<br />
of the sample ensured regional and educational representativeness.<br />
Method<br />
The 25 items of the AIL are listed in a fixed order. The testee is asked to give his opinion on<br />
each item by indicating whether he is interested or uninterested in the described assignment.<br />
Consequently, the response is obtained by the “forced-choice” method. The testee indicates his<br />
judgement on a response form which offers two categories. The preferences are numerically<br />
represented by a preference score P which is defined by the interest-to-disinterest ratio:<br />
Preference score P<br />
=<br />
total interest<br />
total disinterest<br />
It applies P = 1,00 : interest/disinterest in the item are equally great,<br />
i.e. there is indifference<br />
P > 1.00: interest outweighs disinterest, i.e. the item is preferred<br />
P < l,OO: disinterest outweighs interest, i.e. the item is rejected.<br />
The preference scores P were placed in a rank order to show the preferences for assignments of<br />
the AIL items. Rank order comparisons between subsamples were carried out by applying<br />
nonparametric procedures (Kendall’s coefficient of concordance W). The statistical evaluations<br />
were performed on a IBM AT PC with SPSS/PC+ standard software.<br />
Acceptance of AIL<br />
Following the pretest, the testees could express their opinion on AIL by answering the following<br />
three questions:<br />
105<br />
. . . .
Question I: Do you consider such information on military assignments to be a necessary part of<br />
the qualification test?<br />
Question 2: Has AIL provided you with any details about military assignments which you did<br />
not know before?<br />
Question 3: Are you interested in further information?<br />
Table 2 shows the response frequencies for the response categories.<br />
Table 2<br />
Response frequencies of the acceptance poll (yes = +/ no = -/ partly = 0; in percent)<br />
EDUCATIONAL LEVEL<br />
Question total 1) 2) 3) 4)<br />
1)<br />
2)<br />
3)<br />
4)<br />
+ - 0 + - 0 + - 0 + -- 0 + .- 0<br />
1 86 6 8 67 22 11 85 6 9 88 3 9 92 4 4<br />
2 57 9 34 82 9 39 57 9 44 56 8 46 34 13 31<br />
3 56 26 18 55 29 I6 54 26 20 59 25 16 54 28 18<br />
- -<br />
without school-leaving certificate of secondary-level primary school<br />
.<br />
wtth school-leaving certificate of secondary-level primary school<br />
lower secondary school-leaving certificate<br />
secondary school graduation<br />
The results of the acceptance poll reveal that information on military assignments provided by<br />
AIL is considered to be necessary by the majority. The higher the education lcvcl the more<br />
widespread is this opinion.<br />
AIL offers onIy basic information on selected military assignments, but its information content<br />
is detailed enough for more than 50 % of the respondents to gain additional information they<br />
didn’t have before. In contrast to surveys which point to a monotonic rciutiortship bctwecn<br />
knowledge of military service and education level, there is no ascertainable difference in this<br />
regard.<br />
Regardless of education level, more than 50 % of all respondents wished lo obtain more<br />
information about the military assignments presented. Bearing in mind previous surveys looking<br />
into the attitudes towards military service of young men liable to military service which showed<br />
a growing rejection with increasing education levels, this result was somewhat unexpected.<br />
Despite increasingly negative attitudes towards military service, interest in obtaining information<br />
does not decline with higher education levels.<br />
In conclusion, the results of the acceptance poll show that<br />
- the respondents do not have much information on specific military assignments,<br />
- the respondents are keen to obtain additional information,<br />
- the majority of respondents welcome information on military assignments.<br />
Results<br />
The opinions expressed on the 25 AIL items in the test sample wcrc analyzed with a view to<br />
answering the following questions:<br />
(1) Which military assignments arouse greatest interest and which tend to appear uninteresting’?<br />
106<br />
. .
Preference scores<br />
overall sample (N = 1,225)<br />
Figtire 1: Preference scores for AIL items in the overall sample<br />
Preference scores of the first ten<br />
rank positions:<br />
Rank AILitems P<br />
1<br />
2<br />
3<br />
4<br />
5<br />
6<br />
7<br />
8<br />
9<br />
10<br />
driver 3.73<br />
clerk -74<br />
auto.vehic.mech. .74<br />
milpolice .70 ’ _<br />
radar operator .59<br />
armament repair .%I<br />
aircraft mech. 52<br />
gunner .46<br />
light infantery 44<br />
paratrooper 42<br />
(2) Are opinions about military assignments influenced by educational level or regional factors?<br />
(3) Do assignments with fighting and non-fighting troops meet with different degrees of interest?<br />
The following figure shows the preference scores P for the 25 AIL items based on the two<br />
response categories (interesting/not interesting). It clearly highlights the fact that only the<br />
assignment as driver achieves a P-scoreH.0. Since military driving licenses continue to be valid<br />
in civilian live upon completion of military service, assignments to a driver’s job is of great<br />
benefit to military conscripts. The other AIL items received much lower preference scores (all<br />
less 1,0) . The scores are shown in a ranked order.<br />
The subsamples based on school education and regional background produced preferences<br />
completely different from those obtained for the overall sample. As highlighted in table 3 the<br />
number of preferred military assignments (P > 1,0) increases as the level of school education<br />
rises.<br />
Table 3<br />
Preference scores P > 1,0 for subsamples based on school education<br />
and regional background<br />
-a-----<br />
Northern<br />
___________-___---------<br />
R e g i o n s<br />
Central Southern Sum<br />
level 1) 1 5 2 8<br />
Ieve! 2) 1 5 2 8<br />
level 3) I 6 3 10<br />
level 4) 4 5 7 15<br />
Total 7 21 14 41<br />
1) .<br />
2)<br />
wIthout school-leaving certificate of secondary-level primary school<br />
with school-leaving certificate of secondary-level primary school<br />
3)<br />
lower secondary school-leaving certificate<br />
4)<br />
secondary school graduation<br />
107
-_________ _.._.._.^.. ..--._-e___--.. --- --- -- - .-~I. ---.<br />
A statistical analysis of the rank ordered P-scores using KENDALL’s coefficient of concordance<br />
W did not produce any significant differences between the school education samples within the<br />
specific regions.<br />
The results presented in Table 3 show that although the AIL items were appraised in the regions<br />
with varying degrees of intensity, the ranking position of the preference scores largely coincide.<br />
To compare the preferences for assignments with fighting as opposed to non-fighting forces the<br />
mean ranking position of the individuals assignments were taken. In both, the overall sample<br />
and the subsamples based on region assignments with non-fighting forces achieved in each case<br />
a significant better mean ranking position than those with fighting forces.<br />
Table 4<br />
Mean ranking positions for assignments with fighting<br />
and non-fighting forces (* = p c .OS; ** = p c .Ol).<br />
-I__------v---- - - - - m - m - - -------<br />
Mean ranking position Difference<br />
in ranking<br />
Region Fighting forces Non-fighting forces position<br />
Northern<br />
Central<br />
Southern<br />
Total<br />
16.4 9.1 7.3 **<br />
15.0 10.4 4.6 “1)<br />
15.8 9.6 6.2 **<br />
14.9 10.7 4.2 ‘1)<br />
------<br />
Preference for assignments with non-fighting forces was found to be broadly the same throughout<br />
the overall sample and the regional subsamples. In the subsamplcs based on school education,<br />
however, there was no such uniform appraisal (Tab. 5).<br />
Table 5<br />
Mean ranking positions for assignments with fighting/non-fighting forces<br />
in subsamples based on school education *)<br />
-____----___- ____ ______--I_-__-- --.- ----<br />
Mean ranking position Diffcrencc<br />
in ranking<br />
School Education Fighting forces Non-lighting forces position<br />
1) 12.8 12.8 0.0<br />
2) 13.5 12.2 0.7<br />
3) 16.2 9.0 7.2 **)<br />
4) 15.9 9.8 6.1 **)<br />
Total 14.9 10.7 4.2 *><br />
1) School education ( 1 to 4 ) see Table 4.<br />
It would appear that the preference for assignments with non-fighting forces increases, the higher<br />
the level of school education.<br />
Croups with a lower education level showed no or little diffcrencc in their preference for fighting<br />
or non-fighting forces while those with a higher level of education clearly preferred assignments<br />
with non-fighting forces.<br />
The following cross-tabulation with the determinants “region” and “school education” highlights<br />
the differences between assignments with non-fighting and fighting forces. Conscripts with a<br />
.
Region<br />
School’)North Center South<br />
1) School education level 4<br />
(1 to 4 ) see Table 4<br />
5<br />
Differences of mean rankings<br />
NO= - C&AL SOiJTH<br />
Table 6 / Figure 2<br />
Differences in the mean ranking position for assignments with fighting and non-fighting<br />
forces according to school education and region (significant differences underlined).<br />
lower education level show no or non-significant differences for assignments with fighting or<br />
non-fighting forces. In contrast, these differences are clearly pronounced and highly significant<br />
in the cases of conscripts with a higher level of education. With the exception of those with a<br />
school-leaving certificate of a secondary-level primary school in the southern region and those<br />
with a lower secondary school-leaving certificate in the northern region, regional impacts on the<br />
preferences are negligible. When compared to corresponding samples in other regions, these two<br />
samples exhibit significantly high differences in the mean ranking positions.<br />
The results presented here concerning military assignment preferences in samples with different<br />
educational and regional backgrounds are based on the mean ranking positions for the various<br />
assignments with fighting and non-fighting forces. An analysis of the appraisals of the individual<br />
assignments produces quiet divergent results. For example, in all regions those with the highest<br />
school-leaving certificate most strongly prefer the assignment as “gunlayer” with the fighting<br />
forces, but it is only in the central region that there is a clear preference for the assignment as<br />
“paratrooper” .<br />
Conclusions<br />
AIL is an effective and objective way of providing information and ascertaining the assignment<br />
preferences of those liable to military service. In addition to individual assignment prcfcrcnccs,<br />
which are important for placement, it is possible to find out about the main preferences of those<br />
liable to military service and the way they are affected by their regional and educational<br />
background. On this basis, indications of information actions can be taken in the pre-draft phase<br />
(e.g. in.recruitment campaigns). Changes in preference scores will rcvcal whether such actions<br />
are effective.<br />
The AIL procedure is beneficial both to the Federal Armed Forces as an organization and those<br />
liable to military service. Aptitude diagnosis is thus understood IO be a cooperative process<br />
between equal partners which gives the prospective conscript adequate guidance and allows<br />
room for initiative and active participation. The “classical” diagnostic criteria of objectivity,<br />
reliability and validity are supplemented by fcaturcs such as fairms* s, transparency, acceptance,<br />
counselling and innocuousness. Aptitude diagnosis in this form seeks to benefit both sides<br />
(testing organisation and individual candidate) equally.<br />
109
Aptitude-Oriented Replacement of Conscript Manpower<br />
in the German Bundeswehr<br />
Retrospective View<br />
S. B. Schambach<br />
Federal Office of Defense Administration, Bonn, Germany<br />
In 1990, the Psychological Service of the Geman Federal Armed Forces (GFAF) is celebrating<br />
a very special jubilee: The Aptitude and Placement Examinations (EVP) for Draftees at the<br />
Subregion Recruiting Offices have been carried out for 25 years.<br />
Before the EVP was introduced, manpower requirements of draftees had heen met solely on the<br />
basis of medical fitness, the final assignment of a position being controlled by a lot system. At<br />
his muster, each draftee obtained a rank number chosen at random. The slots for replacement<br />
were assembled in a list and also given rank numbers. The draftees were called up by the order<br />
of the list until the slots were filled. For numerous assignments, though, only men were called<br />
up who had a specified civilian occupational training.<br />
The effect of the lot system was that many men of restricted medical fitness were assigned jobs<br />
which they were hardly apt for, while well qualified men failed to be called up for service. In<br />
contrast to this, the increasing standard of technical equipment of the forces required higher<br />
ability of the assigned manpower. In public, criticism of the call-up “lottery game” was growing.<br />
It was for these reasons that in 1965 the Aptitude and Placement Examination was instituted to<br />
be taken by each draftee who was found medically lit at his muster. For the examination method,<br />
the US-American “Army Alpha Test Battery” formed the model, a modified version of which<br />
had already been successfully applied to army volunteers.<br />
The EVP comprises a sophisticated biographical questionnaire mainly referring to interests,<br />
skills and general performance factors, but its core is a test battery covering the areas of general<br />
ability and educational level, perception and reaction, mechanical and electrical engineering<br />
comprehension, as well as some further faculties related to specialist functions. In defined cases,<br />
a group situation test or an interview with the psychologist, or more test procedures can be<br />
applied.<br />
The same test battery is applied to army and air force volunteer applicants (except officer<br />
applicants) in order to facilitate psychological diagnostics for those applicants who are liable to<br />
compulsory service as well. The test battery for volunteers includes additional test procedures.<br />
Table 1: Aptitude and Placement Examination for GFAF Draftees (EVP)<br />
Biographical analysis, with special regard to performance factors, interests and activities, school<br />
and occupational training<br />
Test battery: General intelligence and educational level, technical comprehension (mechanical<br />
and electric engineering), perception-reaction capacity; under defined circumstances also:<br />
Perception tests, group situation test, interview, etc.<br />
Evaluation of behavior characteristics and expression in writing<br />
110
Aptitude-Oriented Manpower Replacement on the Basis of EVP Assignment<br />
Proposals - Some Remarks on the GFAF Recruiting System<br />
The EVP psychologist, on the basis of his diagnostic findings, works out for each draftee<br />
proposals forhis aptitude-oriented placement in military service. The psychologist’s assignment<br />
options are mere recommendations to the recruiting agencies since as yet a draftee has no legal<br />
claim to be trained according to his EVP aptitude assessment. The administration officials in<br />
charge of personnel replacement are instructed to give priority to the EVP results.<br />
Each recruiting official has to record by data input into the central computer, the degree to which<br />
he has taken aptitude objectives into account in every single replacement decision, i.e. regarding<br />
every single draftee who was given an assignment. The following levels of quality in personnel<br />
replacement are discerned:<br />
1 - Aptitude-oriented replacemeht<br />
2 - Job-related replacement<br />
3 - Occupation-related replacement<br />
4 - “Quantitative” replacement (regardless of aptitude).<br />
In this list, consideration of aptitude criteria is decreasing from step to step<br />
1) Aptitude-Oriented Personnel Replacement<br />
Each military occupation on entrance level is characterized by a job title and a corresponding<br />
specialty number stating the military setvice (army, air force, navy) and the type of job. Groups<br />
of similar jobs are combined and labeled by an alphanumeric “assignment symbol”. These<br />
symbols (over a hundred) were specially designed to facilitate personnel replacement. Most of<br />
the symbols comprise several specialties of equal medical and psychological job requirements.<br />
The assignment symbols and their respective job titles and specialty numbers are listed in the<br />
so-called Personnel Requisition Table where the symbols are again grouped with respect to<br />
different fields of service, e.g. artillery functions, aircraft repair, medical duties. The Personnel<br />
Requisition Table also contains additional requirements and hints for placement, linked to<br />
assignment symbols, as e.g. a certain civilian occupational training which is a prerequisite of an<br />
assignment, or if high school graduates am wanted for these jobs.<br />
The troops announce their manpower requirements by giving the assignment symbols. At<br />
present, the core requirements for each of the four annual call-ups are announced half a year<br />
before. Only shortly before each call-up term can the complete personnel requisition be set up<br />
which includes personnel fluctuations by drop-outs, organizational changes, changes in the<br />
degree of medical fitness, enlistment as volunteer, etc.<br />
The recruiting organization can, after their preparatory activities during the course of conscrip<br />
tion (registration of men liable to service, muster, EVP), dispose of accumulated and computerrepresented<br />
data on every single man due for conscription. Even before psychological assignment<br />
proposals are present, the computer will automatically pick out a provisional assignment<br />
symbol corresponding to a man’s civilian occupational training (if he has any). The psychological<br />
assignment proposals are also recorded in the central computer. They are as well given in<br />
terns of assignment symbols, relating to the above-mentioned Personnel Kcyuisi tion Table. At<br />
present, the psychologist may propose up to 9 different assignments for a draftee.<br />
Following a computer-aided optimizing model, the manpower requirements of the troops - in<br />
terms ofassignment symbols- are shared out between the subregional recruiting offices. Those<br />
111
ecruiting ofticcs of a military district which will, according to their stock of assignment<br />
symbols, best be able to meet the rcquircd symbols, arc allotted the requisitions.<br />
The computer will also support the recruiting official in placing a draftee on a spccificd slot.<br />
The machine will automatically provide a placement proposal by fitting a requisition symbol at<br />
issue to one of a man’s given psychologically-based assignment symbols. Yet a great number<br />
of placements are still carried out manually because persons with special charactersitics or<br />
certain personal and social circumstances (unemployed, married, medical doctors, etc.) have to<br />
be considered with priority.<br />
2) Job-Related Personnel Replacement<br />
Sometimes a requisition of a certain job and respective symbol is at issue for which a man due<br />
for that call-up term, and bearing a corresponding assignment symbol, cannot be found.<br />
Conversely, there may be a draftee who from social or occupational etc. reasons shall be called<br />
up for a certain term when there happen to be no requisitions for the symbols proposed for him.<br />
To help the recruiting official in cases like these, the Psychological Service has supplied special<br />
lists which indicate whether a given symbol may be substituted by another one because ofsimilar<br />
aptitude factors, or whether a symbol for a highly qualified job may be replaced by another one<br />
calling for less qualification. For instance, a draftee whose aptitude as a paratroop er has been<br />
stated, will likewise be apt as a guard and security soldier, even if this assignment symbol should<br />
not be proposed for him.<br />
3) and 4): Occupation-Oriented and Quantitative Replacement<br />
In these cases, a position is filled regardless of the psychologist’s aptitude-oriented assignment<br />
proposals, or, as an exception, there are no such proposals at all. Under condition 3), the<br />
assignment will at least follow the man’s civilian occupational training. In most of these cases,<br />
draftees are concerned in whom there prevail exceptional life situations. In such placement<br />
decisions, the psychologist in charge shall collaborate as a consultor.<br />
In about 95 % of the call-ups, the psychologist’s assignment proposals have been taken into<br />
account by the recruiting offtcials, as is shown by the following table which reflects a long-term<br />
state of affairs:<br />
Table 2: Quality of Personnel Replacement with Regard to Psychological<br />
Aptitude Criteria - May 1990 -<br />
AMY<br />
Air Force<br />
Navy<br />
Total<br />
Percentage of Placements<br />
- -<br />
Aptitude-oriented Job-related Occupation-related Quantitative<br />
_-----<br />
95 4 0 0<br />
95 4 1 1<br />
92 6 0 1<br />
95 4 0 I<br />
.-_. ____
The Psychologist’s Method of Proposing Assignment Symbols<br />
In formulating his assignment proposals, the psychologist goes by the system of symbols laid<br />
down in the Personnel Requisition Table. The psychological aptitude prerequisites for the<br />
military jobs are compiled in the so-called Symbol Assignment Ttible. This table was issued by<br />
the Psychological Service and is structured in the same way as the Personnel Requisition Table,<br />
giving the assignment symbols instead of specialty numbers. In this table, psychological aptitude<br />
profiles am set up for each assignment symbol according to the method of multiple cut-off scores.<br />
Different kinds of prerequisites are attached to each symbol which have to be observed by the<br />
psychologist:<br />
Table 3 : Psychologically-Based Assignment Proposals in Accord with:<br />
5.<br />
- medical requirements<br />
- basic intelligence level<br />
- cut-off scores in the relevant subtests;<br />
- (for certain symbols:) additional indispensable or desirable aptitude prerequisites such as<br />
knowledge of English language, driver’s license, etc.<br />
- administrative remarks<br />
- specified civilian occupational training, if indicated in the Personnel Requisition Table<br />
The psychologist will compare the total pattern of his diagnostic findings to the job characteristics<br />
of the assignment symbols, especially to their concretized medical, occupational, test<br />
and other implications, and pick out the ones corresponding to the draftee’s aptitudes. Assignment<br />
symbols ruled out medically are absolutely excluded. With respect to the other aptitude<br />
prerequisites, the psychologist is normally given a high judgment factor:<br />
a) He may go below the prescribed cut-off scores (regarding intelligence level and subtest<br />
results) if the difference is within the frame of the confidential limits given by the test reliability.<br />
b) Major deviations fmm the test profiles are permitted in individual cases if psychodiagnostitally<br />
founded. The same is valid for deviations from occupational and other prerequisites.<br />
The psychologist then ranks the assignment symbols he is going to propose. The priorities are<br />
subject to his judgment. He will take into account for which symbol the aptitude is best, and<br />
which are the draftee’s personal interests and preferences. If a draftee is suited for a so-called<br />
.deficit symbol for which manpower replacement is difficult, this is regularly given priority. A<br />
list of these symbols is available for the psychologist.<br />
Methodical Deficiencies in Aptitude Assessment<br />
The test profiles and cut-off scores established for the assignment symbols were set up according<br />
to expert ratings. They were not based on detailed job analyses. Systematic research on the<br />
113<br />
_
validity of the EVP test methods and the performance of draftees in the jobs assigned them, am<br />
missing for most assignment symbols. A formal comparison via central computer data between<br />
the assignment for which a man was called up, and the specialty he was awarded after basic<br />
training, gave only 75 % congruence. For this low rate, deficiencies of psychological prognostic<br />
methods am only partly the cause. Many conscripts have to be moved during the course of their<br />
basic training from various organizational and medical reasons, so that they will complete their<br />
service in a job other than the one assigned by the recruiting agencies. Nevertheless, the aim of<br />
the Psychological Service is to increase the percentage of correspondence by methodological<br />
improvements.<br />
Table 4: Aptitude Characteristics Relevant in <strong>Military</strong> Jobs<br />
Perception and Reaction Reasoning<br />
Signal Shape Discernment Verbal Skills<br />
Spatial Imagination Memory<br />
Achievement Motivation Mechanical /Technical Comprehension<br />
Reliability Electric Engineering Comprehension<br />
Concentration/Stress Tolerance Psychomotor Coordination<br />
Arithmetical Comprehension Social competency<br />
As a first step, in more than 400 military jobs listed in the Personnel Requisition Table aptitude<br />
characteristics were identified by expert rating in cooperation with the military services. For the<br />
14 characteristics found, see Table 4. An attempt to operationalize these characteristics by<br />
psycho-diagnostic constructs and corresponding test methods which might allow for aptitude<br />
assessment, showed that only part of these constructs are covered by traditional EVP test<br />
methods. Important characteristics, such as<br />
- spatial imagination<br />
-memory<br />
- psycho-motor coordination<br />
- stress tolerance<br />
do not seem to be represented in our test procedures.<br />
Thorough validation studies therefore seem indispensable. At present, 36 psychologists of the<br />
recruiting organization are investigating into some 30 military jobs (listed under 22 assignment<br />
symbols). The studies include:<br />
- detailed job description and analysis of aptitude demands<br />
-identification and operationalization of probation criteria such as award of specialty, successful<br />
completion of courses, assessment of superior, as well as personal criteria such as job<br />
satisfaction, or interest in later enlistment as volunteer<br />
- studies on probation and validation of the traditional EVP examination methods (test procedures,<br />
biographical data, etc.)<br />
- implementation and examination of other test methods, and development of new test methods<br />
if necessary; probation study on these methods.<br />
114<br />
.
Table 5: Study on Job Characteristics: Radio Relay Soldier (Scale: 1 [best] to 7)<br />
Test Methods Traditional<br />
Test Profile<br />
General Intelligence Index<br />
Figure Reasoning Test<br />
Word Analogy Test<br />
Arithmetical Comprehension Test<br />
Orthography Test<br />
Mechanical Comprehension Test<br />
Electric Engineering Comprehension Test<br />
Reaction-PerceptionTest<br />
Signal Discernment Test<br />
Memory Test<br />
Spatial Imagination Test<br />
Concentration Test<br />
3.8<br />
4 3<br />
3 5<br />
5<br />
3 4<br />
3 3<br />
5<br />
5<br />
4<br />
4<br />
4<br />
Proposed Test Profile<br />
(Operationalized<br />
Job Characteristics)<br />
Most of the researchers have presented sophisticated job analyses, and identified probation<br />
criteria. Results of job analyses show that several job titles which are listed under the same<br />
assignment symbol, in the Personnel Requisition Table, differ in their aptitude characteristics<br />
to a degree that separation is being suggested. For numerous jobs, psycho-diagnostic constructs<br />
were found for which our EVP methods do not provide sufficient information (see Table 5 for<br />
the radio relay soldier), They will probably be supplemented by test procedures which will allow<br />
for prognosis of concentration and stress tolerance, memory, and spatial imagination.<br />
Summary<br />
A random-based system of conscript manpower replacement in the German Bundeswehr proved<br />
unable to ensure the sufficient qualification of recruits in their military jobs. Since 1965,<br />
conscripts perform a psychological Aptitude and Placement Examination (EVP) before they are<br />
called up for service. Roughly 75 % of conscripts complete their training regularly by king<br />
awarded the specialty corresponding to their assignment. The aim of the Psychological Service<br />
is to increase this percentage by detecting in conscripts abilities yet unexploited, and making<br />
use of them in personnel replacement. This implies improvements in the methodology of<br />
aptitude diagnosis, especially also the application of new types of tests. By means of psychological<br />
job analysis, work characteristics which have not been covered by EVP diagnostics, are<br />
to be identified, and appropriate examination methods are to be developed. Additionally,<br />
einpirical studies are to be carried out to investigate into the validity of our present examination<br />
methods with regard to military job demands. As a first step, aptitude characteristics of the<br />
military jobs taking part in the quarterly replacement, were categorized and operationalized by<br />
psycho-diagnostic constructs which might allow for aptitude assessment. Inspection of these<br />
constructs shows that part of them are covered by traditional EVP test methods while some<br />
important characteristics do not seem to be methodically represented in our Entrance Examination.<br />
At the moment, validation studies are being carried out on 28 different military jobs for<br />
which requirements are urgent and in which one single aptitude characteristic is prominent.<br />
Investigation designs and some first results are available.<br />
115<br />
- -._.---I__._ 4 ---I
_ ____. - __.-.. ~.._ . ..-__.--<br />
DEVELOPING A TRAINING TIME Sr PROFICIENCY MODEL FOR ESTIMATING<br />
AIR FORCE SPECIALTY TRAINING REQUIREhlENTS OF NEW WEAPON SYSTEMS<br />
David S. Vaughan Winston R. Bennett<br />
Jimmy L. Mitchell SC J. R. Knight David V. Buckenmyer<br />
McDonnell Douglas Missile Training Systems Division<br />
Systems Company Air Force Human Resources Laboratory<br />
Abstract<br />
Estimating traiqing costs and training capacity constraints are among the major manpower,<br />
personnel, and training issues in the development of new weapon systems. Use of the recently<br />
developed Training Decisions Modeling Technology in the systems acquisition process is<br />
problematic since no occupational survey data will be available as a basis for modeling the specialty,<br />
its jobs, and its training. This paper reports an innovative experimental approach using subject<br />
matter experts’ ratings of generic skill and knowledge categories for the anticipated work to predict<br />
training time and proficiencies (training setting-specific learning curves). Regression aaalysis<br />
indicates that substantial proportions of the variance in training time curves can be predicted from<br />
such ratings. This approach may improve training decision making and logistic support x~~lyscs<br />
early in the new weapon system acquisition process.<br />
Bacl
equired for an occupation (Ruck, 1982). It includes procedures for developing data bases<br />
and modeling the dynamic flow of people through jobs and through both formal training and<br />
on-the-job training. Furthermore, the system includes modeling and optimization capabilities<br />
which provide estimates of training quantities, costs and capacities for both formal training<br />
and training on-the-job training (Vaughan, et al., 1989).<br />
Problem - MPT Decisions in the New Weapon Systems Acquisition Process<br />
In the New Weapon Systems Acquisition Process (NWSAP), the assessment of changes<br />
required in manpower, personnel, and training programs are difficult (Gentner, 1988) - the<br />
problem is particularly acute for the largely-hidden on-the-job training (OJT) costs and OJT I<br />
capacity of units which will receive the new system. The TDS, with its capability to estimate .<br />
such costs and capacities, may be of considerable value in helping evaluate MPT costs and<br />
capacities in NWSAP studies, if TDS procedures can be adapted to predict needed task<br />
charactersitics and to model expected impacts on job and training patterns.<br />
TDS Training-Time Models<br />
Training-time models are important components of the TDS data base. These models<br />
may be thought of as learning curves; they translate training time on a group of tasks (task<br />
module) into the proficiencv, relative to full proficiency, obtained from such training. Figure<br />
1 illustrates a set of training-time models for an aircraft maintenance task. Note that<br />
separate learning curves were developed for several major training settings or training<br />
delivery methods, including classroom, correspondence course, guided hands-on, and OJT.<br />
These training-time models permit different training delivery methods to be traded off to<br />
find the best way, or combination of ways, to deliver training for a particular task. These<br />
training-time models play a critical role in the TDS model. In particular, they are the basis<br />
for estimating OJT training quantities.<br />
1<br />
Profiency (Z:)<br />
00 ,<br />
6 0 -<br />
0 2 4 6 a 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40<br />
Training Hours<br />
Figure 1. Training Allocation Curve for Aircraft Environmental Systems Task Module 34.<br />
117<br />
-<br />
i
___-.-. _- _-_.__^I__. _.. - -...<br />
--~_<br />
In the TDS R&D work, training time models were developed from SlMEs’ judgments<br />
concerning training times in various training settings required to reach full proficiency. This<br />
approach proved satisfactory for ongoing MPT planning applications (Vaughan, et al, 1989).<br />
However, it poses several problems for the weapon-system-design application. First, it<br />
requires SMEs who are familiar with training on the subject tasks. For a new weapon<br />
system, there are no SMEs with “hands on” experience with training. This is a common<br />
problem in Logistics Support Analysis (LSA); the usual solution is to find extsting systems<br />
that are comparable to a new system. Data and SMEs are used for the existing comparable<br />
systems. In general, this approach involving comparable existing systems could be used to<br />
estimate training time models for the new weapon system tasks. However, because the new<br />
system often makes use of technology not incorporated in any comparable existing system,<br />
some of the new tasks have no counterparts on existing systems. Thus,. the comparable<br />
e.xisting system approach is not entirely satisfactory for our use.<br />
Experimental Approach<br />
In the weapon-system-design application, the TDS should be sensitive to design changes<br />
and should provide feedback to designers concerning which features or aspects of their<br />
designs are the primary drivers of training requirements. The training-time-modeling method<br />
does not identify which task features or characteristics determine a task’s training time model<br />
and cannot provide feedback to designers concerning how change a design in order to reduce<br />
training requirements. That method would rely entirely on SMEs’ judgments based on global<br />
task experience to obtain the training time models. As a consequence, the method is not<br />
likely to be very sensitive to impacts of design changes on task training times.<br />
Equation 1 is the model that we used on the TDS R&D to estimate the training time<br />
curves such as illustrated in Figure 1:<br />
p = ac,hc, + acIzhcl* + a,h,, + acorLhmrZ + aSohho + ahoZhho2 + ao,~holl<br />
+ ao,12ho,12<br />
[Equation l]<br />
where p = relative proficiency, a:s = regression weights, hi’s = training hours in various<br />
training settings, and subscripts for training settings are defined as:<br />
cl = classroom,<br />
car = correspondence course,<br />
ho = guided hands on (Air Force field training detachment courses, etc), and<br />
ojt = on-the-job training.<br />
The model of equation 1 has several features. First, it has no additive constant; zero<br />
training hours produces zero proficiency. Second, each curve of Figure 1 corresponds to the<br />
second-order polynomial equation section of equation 1 associated with a particular training<br />
setting. Third, the polynomial equation segment associated with each training setting is<br />
neg:tivelqr accelerated. All eight model parameters associated with a particular task were<br />
estimated simultaneously in a single regression analysis.<br />
118
For the weapon system design TDS application, our objective is to replace the separate<br />
training time equations for each task module with a single equation that can be applied to<br />
any task. In the desired equation, task modules are described by scores on scales which<br />
reflect various skill and knowledge requirements. The first step in developing such a training<br />
time model is to generalize the task-specific training time model of equation 1 to cover many<br />
tasks. This can be done by introducing dummy-coded task identification variables:<br />
P = sum(i= 1,t) [ a,,h,,xi + acuhclki + a,&,> + amdhmrki +<br />
where xi dummy-coded task identification variable for task i,<br />
i= l...r, and<br />
xi = 1 if the current observation is for task i, 0 otherwise.<br />
[Equation 21<br />
Equation 2 may be thought of a model whose variables are interactions of tasks and<br />
training hours. This model contains 8 times t (number of task modules) interaction predictor<br />
variables. Consider a model in which tasks (e.g., the task indicator variables x are replaced<br />
with task descriptions in the form of scores for tasks on skill and knowledge scales:<br />
P = sumtj = l,r) [ aclh,Jj + acuhZYj + amrhmryj + amrlhmr% +<br />
where yj = score for current task on rating scale j, j = l...r.<br />
[Equation 31<br />
Equation 3 may be thought of as a special case of equation 2, in which the task by<br />
training hour interaction is restricted to that portion attributable to the task rating scale<br />
scores. If the scores measure the task features that drive their training time models, then<br />
equation 3 will account for most of the proficiency variation that equation 2 can account for.<br />
The next step in building our training time model was to identify a set of standardized<br />
skill and knowledge scales. For this purpose, we adopted a set of 26 skill and knowledge<br />
dimensions that was developed by occupational analysts at the USAF Occupational<br />
Measurement Center for classifying tasks in various occupations (Bell & Thomasson, 1984).<br />
More recently, these task dimensions have been revised by researchers for use in assessing<br />
skill transferability between occupations (Lance, Kavanagh, & Gould, 1989).<br />
We obtained ratings on each of the 26 scales (see Figure 2) for all task modules in the<br />
Aircraft Environmental Systems Maintenance (Air Force Specialty 423Xl),occupation. This<br />
occupation contains 57 task modules, each composed of one or more occupational survey<br />
tasks (Perrin, Knight, Mitchell, Vaughan, & Yadrick, 1988). The ratings were obtained<br />
specifically for this R&D work from five Air Force Non-commissioned Officers (NCOs) who<br />
were experienced in the Aircraft Environmental Systems Maintenance occupation.<br />
Agreement among the five raters was measured for each scale by the intraclass correiation<br />
or omega-squared (Hayes & Winkler, 1971). This intraclass correlation is equivalent to the<br />
R,, measure often used to evaluate occupational.survey task factor ratings. For raw ratings,<br />
119
intraclass correlations for many scales were zero<br />
or negative. A standardization transformation to<br />
remove scale use differences among raters was<br />
needed. For these scales, a zero rating has<br />
absolute meaning--that a task requires no skills<br />
or knowledges related to a particular scale. A<br />
standardization transformation should not change<br />
zero ratings. For this reason, the following<br />
standardization was applied to the ratings:<br />
Y ,,k = X,,, / [sum(k= 1,57) Xijk ] [Equation 41<br />
where yi,k = standardized rating for rater i on<br />
scale j and task k, and<br />
X Ilk = raw rating for rater i on scale j and<br />
task k.<br />
Figure 2 presents interrater agreement<br />
statistics for the 26 rating scales after this<br />
standardization. The ratings have acceptable<br />
interrater agreement. Three of the scales,<br />
Medical-Patient Care, Medical-Equipment<br />
Oriented, and Medical Procedures had no<br />
non-zero ratings for tasks in this occupation.<br />
Thus, meaningful intraclass correlation statistics<br />
could not be computed for these scales although<br />
all raters agreed on all ratings for these scales<br />
(zero).<br />
For modeling purposes, we augmented the<br />
training-time data file from the TDS R&D with<br />
scores on the 26 skill and knowledge scales. We<br />
used mean standardized ratings across raters for<br />
each task and scale. If the 26 scales are a useful<br />
basis for estimating training-time models, then<br />
equation 3, which uses scores on the scales along<br />
with training times to predict proficiency, should<br />
account for most of the proficiency variation<br />
accounted for by equation 2, which includes<br />
actual task identities.<br />
Results<br />
Scale ( Omega21 Rkk<br />
1. Clencal .28 .66<br />
2. Computatronal .13 .44<br />
3. Office Equipment Operatton 34 .72<br />
4. Mechanical .13 A3<br />
5. Simple Mechanrcaf .06 .23<br />
EquipmentlSystems<br />
Operatron<br />
6. Complex Mecnanicaf .Ol .06<br />
Equipment/Systems<br />
Operation<br />
7. Mecnanical-Electrical .15 .47<br />
8. Mechanfcai-Electronrc .20 .56 -.<br />
9. Elecnrcal .ll .38<br />
10. Eiewomc .20 .56<br />
11. E!ectncaf-MechanrcaJ .22 .58<br />
12. Eiectncaf-Eiectronc .13 A3<br />
13. Eiectronic-Mechantcal .19 .54<br />
14. Simple PhysrcaJ Labor .oo .oo<br />
15. Medical-Pattent Care . .<br />
16. Medical-Equipment . .<br />
Or-tented<br />
17. Medical Prccedures . .<br />
18. Simple Nontechnical .02 .I3<br />
Procedures<br />
19. Communicative-Oral .20 .55<br />
20. Communicatrve-Written .*9 .a3<br />
21. General Tasks Or .13 .e<br />
Proceaures<br />
22. Reasoning/Planning/ .05 .2?<br />
Analyzing<br />
23. Scienttfic Math Reasoning .08 .31<br />
Or Calculatrons<br />
24. Speaal Talents .05 ‘9<br />
25. Supervisory .27 24<br />
26. Training .05 .20<br />
Note:<br />
Omega Squared Is The Intrac!ass Correlation fnter-<br />
Rater Agreement: It Is Equivalent To The Fit 1. Pkk<br />
Is The Estimated Reliability For The Mean Rating From<br />
Five Raters.<br />
‘All Tasks Had Zero Ratings; Inter-Rater Agreement<br />
Statistics Are Meaningless.<br />
Figure 2. Interrater Agreement Data<br />
for the 26 Skill and Knowledge Scales.<br />
Our first modeling activity was to fit the regression model of equation 2. The R* for this<br />
model was .65, which is statistically significantly greater than zero: F(451,2255) = 9.5, p < .OOl.<br />
Next, we fit the regression model of equation 3, which replaced task identification variables<br />
with scores on the skill and knowledge scales. The R* for this model was .52, which is also<br />
statistically significant: F( l&$,2525) = 14.9, p < .OOl. If one views the skill and<br />
120
knowledge-based model (equation 3) as a restricted version of the full task identity model<br />
(equation a), the R’ increase associated with the full model may be tested. That test shows<br />
that the R’ difference between these models, .13, is statistically significant: F(267,2258) =<br />
3.1, p c .Ol. However, the skill and knowledge scales model accounts for 80% of the variance<br />
accounted for by the full-task model. Thus, the skill and knowledge model has great<br />
practical value for estimating TDS training time models.<br />
Discussion<br />
The skill-and-knowledge scale model is much more accurate than we expected. Also, it<br />
permits training-time models--learning curves--to be estimated for tasks early in the design<br />
process, and it provides feedback to designers concerning which particular task skill and<br />
knowledge requirements are causing high training times. For these reasons, we believe that<br />
the skill-and-knowledge trainingtime model represents a significant step forward in our<br />
ability to design systems with acceptable training requirements, and to incorporate MPT<br />
considerations into the weapons system design process.<br />
REFERENCES<br />
Bell, J. & Thomasson, M. (1984). Job Cntecorization Proiect. Randolph AFB, TX: United States Air Force<br />
Occupational Measurement Center.<br />
Christal, R.E., s( Weissmuller, J.J. (1988). Job-task inventory analysis. In S. Gael (Ed), Job Analvsis Handbook<br />
for Business. Industrv. and Government. New York: John Wiley and Sons, Inc. (Chapter 9.3).<br />
Gentner, F.C. (1988, December). USAF Model Manpower, Personnel and Training Organization--An Update.<br />
Proceedincs of the 30th Annual Conference of the Militarv Testinu <strong>Association</strong>. Arlington, VA: U. S. Army<br />
Research Institute.<br />
Hayes, W.L. Sr Winkler, R.L. (1971). Statistics: Prohahilitv. Tnference and Decision. New York: Holt, Rinehart<br />
s( Winston.<br />
Lance, C.E., Kavanagh, MJ., Sr Gould, R.B. (1989, August). Development and Convergent Validity of Cross-Job<br />
Movement Indices. Paper presented at the annual meeting of the American Psychological <strong>Association</strong>, New<br />
Orleans, LA.<br />
Mitchell, J.L.; Ruck, H.W.; & Driskill, W.E. (1988). Task-based training program development. In S. Gael (Ed),<br />
Job Annlvsis Handbook for Business. Tndustrv. and Government. New York: John Wiley Sr Sons, Inc.<br />
Perrin, B.M., Knight, J.R., Mitchell, J.L., Vaughan, D.S., & Yadrick, R.M. (1988). Trainine Decisions Svstem:<br />
Develonment of the Task Characteristics Subsvstem (AFHRL-TR-88-15). Brooks AFB, TX: Training Systems<br />
Division, Air Force Human Resources Laboratory.<br />
Ruck, H.W. (1982, February). Research and development of a training decisions system. Proceedinas of the<br />
Societv for Applied Learnine Technoloa. Orlando, FL.<br />
Ruck, H.W., SC Birdlebough, M.W. (1977). An innovation in identifying Air Force quantitative training<br />
requirements. Proceedines of the 19th Annual Conference of the Militarv Testine <strong>Association</strong>. San Antonio, TX:<br />
Air Force Human Resources Laboratory and the USAF Occupational Measurement Center.<br />
Vaughan, D.S.; Mitchell, J.L.; Yadrick, R.M.; Perrin, B.M.; Knight, J.R.; Eschenbrenner, A.J.; Rueter, F.H.; and<br />
Feldsott, S. (1989, June). Research and Develonment of the Trainina Decisions System (AFHRL-TR-88-50).<br />
Brooks AFB, TX: Training Systems Division, Air Force Human Resources Laboratory.<br />
121
EVALUATING TRAINING PROGRAM MODIFICATIONS<br />
Deborah Lawson McCormick and Paul L. Jones<br />
Naval Technical Training Command<br />
Evaluating changes in training programs is never a simple<br />
task, even under laboratory conditions where threats to validity<br />
can be controlled. In operational settings evaluation may appear<br />
to be an insurmountable problem -- one in which good evaluation<br />
methodology does not seem feasible. One major problem for __<br />
evaluators in operational settings is that they are often not<br />
consulted until after training modifications have already been<br />
initiated. As a result, both experimental control and<br />
opportunities for data collection are severely limited.<br />
Even in those rare cases where evaluators are a part of the<br />
implementation from its onset, problems exist. For example, an<br />
evaluation design which uses equivalent control and experimental<br />
groups is often not possible in on-going training programs. In<br />
addition, operational settings are inherently dynamic environments;<br />
consequently, the effects of deliberate program changes are<br />
confounded with effects of other random factors which 'constantly<br />
impact the program. In these cases, isolating effects directly and<br />
unquestionably attributable to factors of the program change is<br />
impossible.<br />
This difficulty in establishing definite cause and effect<br />
relationships is sometimes used as a reason to forego evaluation.<br />
Rather than attempting a seemingly futile task, the tendency is to<br />
rely on intuition. The argument goes something like this: "These<br />
changes make sense, the students like them, the instructors like<br />
them . . . they probably work."<br />
However, increased competition for funding dollars makes the<br />
need to verify training improvement and justify additional funds<br />
crucial. Increasingly, funding sources are requiring hard data in<br />
support of dollars spent. As evaluators, we are being forced to<br />
accept that a less than perfect evaluation (that is, one which only<br />
suggests, rather than "proves," cause and effect) is better than no<br />
evaluation at all.<br />
This paper describes an evaluation model which we feel is<br />
flexible enough to prove useful in most evaluation circumstances,<br />
from the ideal condition, where evaluation has been planned in<br />
conjunction with change implementation, to those evaluation<br />
nightmares, where change implementation is complete before the<br />
evaluator is consulted. Following a brief description of the<br />
model, and application of its use is discussed.<br />
122
EVALUATION MODEL<br />
The evaluation system we recommend approaches evaluation on<br />
two levels. At the more immediate level, we attempt to determine<br />
effects on student performance in the specific training areas<br />
modified. For example, changes in test scores or training time in<br />
those specific content areas might be analyzed. The second level<br />
attempts to determine how the training program as a whole was<br />
affected by the modifications. Such measures as course attrition,<br />
total training time, performance in other areas of the program,<br />
etc., would be considered. Inferring cause and effect<br />
relationships becomes riskier as one moves to these more general<br />
measures of effect -- measures further removed from the proposed -'<br />
cause. However, modification in one area of the training program<br />
should ultimately affect the program as a whole and become<br />
manifested in these general measures. In reality, it is this<br />
broader impact which serves as the bottom line point of interest<br />
for most of our clients.<br />
The evaluation model we use can be condensed to a six-step<br />
process described below:<br />
1. Beqin evaluation nlanninq earlv, before imnlementation of the<br />
proqram chanqe if oossible. The evaluator should be involved as<br />
soon as possible, ideally during the modification planning stage -certainly<br />
prior to modification implementation. Many threats to<br />
validity can be anticipated and controlled if the evaluator is<br />
involved in this manner. Realistically, however, we know that this<br />
scenario seldom occurs. More often, the evaluator is called in<br />
after the modification has been implemented. For this reason, we<br />
usually find ourselves beginning with step two.<br />
2. Know the oroqram YOU are about to evaluate. A thorough<br />
understanding of the nature of the program change and its impact on<br />
the general operation of the training program is critical to good<br />
evaluation. The evaluator must understand the program's<br />
objectives, the anticipated impact of the change on these<br />
objectives, and the methods used to accomplish them. In addition,<br />
the evaluator must determine what data is currently being collected<br />
to evaluate program performance and whether this data might be<br />
useful in evaluating the program change. Most importantly, a<br />
definitive statement of how the change is intended to affect the<br />
program (that is, the goal of the program chanqe) must be<br />
formulated.<br />
3. Determine data collection procedures and qather baseline data.<br />
The purpose of baseline data is to develop a snapshot of how well<br />
the program is performing in the area to be modified prior to the<br />
change. Often you will find existing measures of performance, such<br />
as test scores, which directly address this question. In other<br />
123<br />
I
a<br />
----- --.___-__..-.. . . . _. ..____.._ -. ~~<br />
.<br />
cases, it will be necessary to introduce new data collection<br />
instruments, e.g., surveys, questionnaires, etc.<br />
Whether data collected should be restricted to only the area<br />
modified, to a broader segment of the program, or to the program in<br />
totality depends on the expected effect of the change. In general,<br />
the interdependence of program parts usually warrants a complete<br />
evaluation.<br />
4. Monitor the implementation of the chancre. Keep informed about<br />
how the implementation is proceeding. Document associated factors<br />
which might impact the success of the change, such as changes in<br />
instructor or student attitudes, changes in quality of either<br />
instructors or students, or changes in resources.<br />
5. After the nroaram has stabilized followina incorooration of the<br />
chanae. aather data for comoarison with the baseline. This step<br />
involves collecting data corresponding in type to the baseline data<br />
for a sample of students under the modified program. This step<br />
will involve readministration of instruments developed for the<br />
evaluation, for example attitude questionnaires.<br />
6. Analyze data and internret results. Most of our clients have<br />
neither the time nor the propensity to wade through a morass of<br />
statistics. Although we sometimes use fairly sophisticated<br />
statistical procedures and usually include these analyses in the<br />
report, we always attempt to synthesize the findings for our<br />
clients. We try to answer the general question of how the training<br />
program was affected by the modification in an easily accessible<br />
one page (or one table or figure) summary.<br />
MODEL APPLICATION<br />
In 1987, a project known as the Model School program was<br />
initiated at Electrician Mate's (EM) School at the Naval Training<br />
Center, Great Lakes. The purpose of this project was to examine<br />
the training program for EM's at this school, explore ways to make<br />
that training better, and implement those that were feasible. As<br />
a result of this project, a number of changes took place in this<br />
program over the next two years. For instance, a technology-based<br />
learning center was instituted, changes in remediation occurred,<br />
the testing program was revised somewhat, etc.<br />
In the spring of 1990, the Research Branch of the Navy<br />
Technical Training Command was tasked with conducting an evaluation<br />
of the impact of these changes on the training program. Because we<br />
were not involved during the implementation of the project, we used<br />
a modification of the six-step approach described above.<br />
In this case, our first step was getting to know the program<br />
we were tasked with evaluating. From talking with the school staff<br />
124
and various other sources, we became familiar with the training<br />
program as it existed prior to the Model School Project, the<br />
objectives of the project and changes made to the training program<br />
as a result of it, and other occurrences which, although they may<br />
have been coincidental, had potential for impacting the training<br />
program. We found that many of the changes had potential for very<br />
subtle impact; for example, the staff's optimism for the program<br />
probably improved their teaching, but this notion is difficult to<br />
substantiate. Also additional out- of-class study aids had been<br />
developed and introduced throughout the training program.<br />
Cumulatively, one would expect these changes to result in improved<br />
student performance; however, we felt the attempt to isolate and<br />
attribute effects to individual factors would be impossible'in a -.<br />
post hoc evaluation design.<br />
With these ideas in mind, we approached the evaluation with<br />
two broad questions: (1) How did the performance of pre-Model<br />
School project students compare with the performance of post- Model<br />
School students, in terms of attrition rate, setback (repeating of<br />
course segments) rate, test scores, and number of retests? (2)<br />
What changes occurred in the intervening time period which may have<br />
impacted student performance?<br />
Next we constructed baseline data for a group of students<br />
attending the training prior to modification for use as a quasicontrol<br />
group. The school had been utilizing an automated testing<br />
program which maintained students' scores on tests and number of<br />
retests taken. This data gave us a picture of performance in the<br />
individual content areas, as well as an overall measure of<br />
performance. We also collected more general performance measures<br />
such as course attrition and setback data.<br />
Corresponding information was collected for a comparison group<br />
who received the training after Model School Program<br />
implementation. Because academic ability levels of students in<br />
fundamental training courses have historically varied<br />
systematically with the season of the year, we selected our<br />
comparison group from months corresponding to that of the '8contro111<br />
group in an attempt to maximize equivalency of the two groups.<br />
Data were, analyzed and major findings presented in the summary<br />
format shown in Figure 1. This graphic representation enabled us<br />
to overlay potential impacting factors with major measures of<br />
student performance. Our clients liked this format because it<br />
provided an at-a-glance picture of both the changes to the training<br />
program and corresponding variations in terms of student<br />
performance. In this instance, the overall impact of the program<br />
appeared to be positive in that the two major indicators of<br />
training success, attrition and setback rates, both improved.<br />
125
-----r^___._.______I_______.<br />
-...-.._---_-----. - --<br />
126<br />
.--_ ._ __ _
.--_-.-- .---- ----- -...- -._--_.--_-__<br />
CONCLUSION<br />
Sometimes evaluators are hesitant to perform evaluations such<br />
as the one just described because they are too messy and imprecise.<br />
When we are coerced to perform them, we tend to apologize for the<br />
product. These evaluations do little more than provide a<br />
historical description the program in terms of its components and<br />
its performance. But even these evaluations serve two important<br />
functions. First, they provide concise, accurate descriptive<br />
information to program managers, information otherwise not<br />
available to them. Secondly, they establish a climate conducive to<br />
evaluation. Managers become aware that many questions they would<br />
like to have answered conclusively could be answered if evaluators -'<br />
are consulted early in the modification process.<br />
In summary, our advice is to accept those messy evaluation<br />
projects, adapt proven evaluation methodology/procedures to your<br />
particular set of circumstances, and conduct the evaluation,<br />
exercising controls to validity wherever possible. At best, you'll<br />
be able to analyze the data and draw cause and effect relationships<br />
with reasonable accuracy. At worst, you'll be able to synthesize<br />
the data and describe changes, suggesting possible causes. In<br />
either case, the client will have more information and be better<br />
equipped to make informed decisions than he/she would have been<br />
otherwise.<br />
127
_____._._ .- _ .-... -.._- -..._ . . _ _.<br />
The Effect of Reading Difficulty<br />
on Correspondence Course Performance<br />
Dr Grover E. Diehl (ECI)<br />
During the 1989-90 academic year the Air Force Extension Course<br />
Institute(EC1) broadly examined the impact on reading level on<br />
the correspondence courses in Career Development Course (CDC)<br />
First reading grade levels (RGL) of CDCs were<br />
~~~~~%$ using Che FORCAST method. The FORCAST method involved<br />
the manual counting of words in samples of text. These RGLs here<br />
then examined to determine whether the RGLs had increased significantly<br />
on a year by year basis and also with 1982 data to<br />
determine whether there were significant differences between the<br />
samples. Next, RGLs were correlated with end-of-course performance<br />
by percent of first time exam failures and then by proportion<br />
of overall course failures. Following this, the FORCAST<br />
RGLs were correlated with target RGLs prepared by the Air Force<br />
Human Resources Laboratory and with computer generated RGLs<br />
using a Flesch-Kincade formula.<br />
A basic intervening variable in the assessment of reading difficulty,<br />
however, was the fact that personnel and Air Force jobs<br />
were matched during enlistment processing so that the most intellectually<br />
demanding skills were peopled with the most intellectually<br />
able personnel. One way around this problem was to<br />
calculate difference scores between the RGL targets and the obtained<br />
FORCAST RGLs -- a measure of perceived difficulty of the<br />
material to the student -- and correlate this with failure rate.<br />
This procedure treated student ability as a covariate with a<br />
corresponding reduction in the error portion of the predication<br />
equation, without the necessity of using analysis of variance.An<br />
analysis of difference scores constituted the last question to<br />
be addressed.<br />
Findings<br />
FORCAST RGL and Edition Date. No statistically significant association<br />
was found between the FORCAST reading level and the edition<br />
date of the materials (a period of about 12 years). The<br />
Pearson Product Moment Correlation coefficient (r) of FORCAST<br />
RGL with edition date was .0742 (N = 215, p =.279). To check<br />
for possible curvilinearity, a scatterplot was prepared which<br />
suggested a completely random occurrence pattern. FORCAST reading<br />
level did not vary in a linear way from year to year.<br />
Difference Between RGLs Sampled in 1982 and 1990. There was apparently<br />
sufficient variation within the samples to be statisti-<br />
128
tally significant. Hotelling's T2 was 5.1998 with a probability<br />
of .027 at 1 and 2 degrees of freedom. It should be noted, however,<br />
that the test,;was made on a group to group basis and there<br />
was no indication which individual pairs may have changed the<br />
most. It was in fact possible that no pairs would significantly<br />
vary even though the full model rejected the null hypothesis.<br />
Also, since the test was non-directional, it was not possible to<br />
identify which group contained higher RGLs than the other although<br />
they appear to have been higher more recently. I<br />
.<br />
An observation related to this, however, was the significant'<br />
correlation of the RGLS of the two samples (r = .4709, p =<br />
.OOl). This raised an interesting situation in which samples<br />
taken in 1982 and 1990,: although significantly different,. were<br />
none the less related. The relatedness, however, was not developmental<br />
over time. One possible solution to this ambiguity was<br />
that RGL was varying with frequent but intermittent corrections<br />
using a current "clearly written text" standard.<br />
1990 RGLs and First Time Examination Failures. RGLs were not<br />
significantly related to first time examination failure rates (r<br />
= .0741 and probability = .327).<br />
1990 RGL and Overall Course Failures. ~11 students failing the<br />
first final examination were provided a retest. Course failure<br />
required failure of both the first examination and the<br />
reexamination. As was the case with first time exam failures,<br />
course failures were not significantly related to RGL in the<br />
1990 sample (r = .0404, probability = .403).<br />
FORCAST RGL and AFHRL Targets. The correlation between FORCAST<br />
RGLs of course materials in the 1990 sample and AFHRL targets of<br />
actual student reading ability (50th percentile reading ability)<br />
was .0695 and was not significant (p = .333). A reduced target<br />
at the 15th percentile also failed to be significantly related<br />
to the obtained FORCAST RGL (r = .0249 with probability = .439).<br />
The data failed to demonstrate that the variation within the<br />
reading ability of personnel was linearly related to FORCAST<br />
RGLs of the CDC material.<br />
FORCAST RGL and Flesch-Kincade RGL Comparison. Due to resource<br />
limitations on the Flesch-Kincade RGL side, comparisons were<br />
made on only one CDC consisting of four volumes. The means and<br />
SDS were 11.2725 and . 5187 for FORCAST and 9.0800 and .1619 for<br />
Flesch-Kincade. The obvious difference between the averages was<br />
significant with Hotelling's T2 of 92.8489 and probability equal<br />
.004 '(df = 1 and 3). Flesch-Kincade generated significantly<br />
lower RGL estimates than did FORCAST. The correlation, although<br />
129
------ ---- __ ._ ._ -.._-_-- .--.<br />
- .<br />
large by research standards (r = .5252) was not statistically<br />
significant (p = .237).<br />
Difference Scores and Failure Rates. Using a 50th percentile<br />
personnel reading ability as a target base, the correlation of<br />
the RGL deficits with first time exam failure rate was .2117<br />
with a probability equal to . 101 -- not statistically significant.<br />
When a 15th percentile target base was used, the correlation<br />
of the deficits with first time failure rates was also<br />
not significant (r = .2459, p = .068). Similar analyses of .<br />
course failure rates yielded the same result.<br />
Conclusion<br />
FORCAST reading grade levels were not significantly associated<br />
with end-of-course test performance, reading grade level targets<br />
using the Air Force Reading Ability Test scale, or Flesch-<br />
Kincade reading difficulty obtained from a computer analysis.<br />
Additionally, FORCAST reading grade levels had not changed consistently<br />
over time. There was evidence that RGL had risen<br />
slightly sometime during the eight year period but it was unclear<br />
whether the rise was continuing.<br />
Careful examination of the summed evidence suggested, however,<br />
that the null outcomes were possibly due to an aggressive<br />
"clearly written text" program within ECI. This effort, which<br />
replaced FORCAST in the mid-1980s, introduced an ongoing conscious<br />
effort on the part of the text writers and reviewers to<br />
ensure the readability of the materials. Earlier information<br />
suggested that use of FORCAST was associated with a reduction in<br />
reading difficulties to the point where FORCAST was no longer<br />
predictive. Present data suggested that the "clearly written<br />
text" standard may continue to limit the value of FORCAST as a<br />
predictive indicator.<br />
Discussing more generally the issue of attention to RGL, it was<br />
noted that most ways of determining RGL and tests designed to<br />
assess the reading ability of students were highly correlated -often<br />
as highly intercorrelated as the validity coefficients of<br />
the individual measures. Differences in outcome Values were<br />
typically due to scale. The task of maintaining acceptably low<br />
reading difficulty within written materials was primarily one of<br />
maintained focus on the problem using any of several means. FOR-<br />
CAST was one means easily calculated by hand. The Flesch-<br />
Kincade RGL provided here by Right-Writer, although almost<br />
necessitating a computer, was a viable option especially when<br />
the written material was already in an acceptable word processing<br />
medium. The Right-Writer output in fact contained consider-<br />
I . ..1__<br />
II- -...<br />
130
able ancillary information which could be useful to writers. For<br />
example, suggestions for making writing more direct and improvement<br />
of sentence structure were given, and there was a listing<br />
of negative words, jargon, colloquial and misused words,<br />
questionable spellings, and words which readers may not understand.<br />
External reports such as these served to alert writers<br />
and reviewers to idiosyncrasies which may distract the student<br />
from the material and to maintain focus on reading difficulty<br />
and level.<br />
Audience: Instructional developers.<br />
. .<br />
For more information contact Dr Grover Diehl, ECI, Gunter AFB AL<br />
36118-5643,<br />
AUTOVON 446-3641 or commercial 205-279-3641.<br />
131
Navy Basic Electricity Theory Training: Past, Present, and Future<br />
Steve W. Parchman, John A. Ellis, & William E. Montague<br />
Navy Personnel Research and Development Center<br />
THE OPINIONS EXPRESSED IN THIS PAPER ARE THOSE OF THE AUTHORS,<br />
ARE NOT OFFICIAL AND DO NOT NECESSARILY REFLECT THE VIEWS OF<br />
THE NAVY DEPARTMENT<br />
Introduction<br />
Basic electricity and electronics theory training (BETT) in the Navy has historically<br />
had high attrition and setbacks and has been plagued by questions about the relevance of<br />
the course content to Navy jobs. BETT is taught as a separate topic at the beginning of<br />
more than twenty Navy A schools to more than 20,000 students annually. BETT<br />
material historically has proven difficult for students to learn and has resulted in high<br />
attrition and set-back rates. For example, in FY 88 attrition in five electrical A schools<br />
averaged 28% (AE, ET, EM, IC, DS; total annual throughput = 5000). Average setback<br />
rate for these same schools was 69%. Approximately 70% of these losses occurred in the<br />
BETT phase of these courses. Further, the abstract nature of this content has raised<br />
questions about its relevancy for vocational jobs. For example, recent research has<br />
shown that trainees who have passed course tests fail to pass relatively simple practical<br />
exercises. These problems with trainee learning have remained even in the face of<br />
substantial expenditure of effort to revise the content and to change the method of<br />
delivering it.<br />
Research on learning and training suggests that more fundamental changes in<br />
curriculum structure can lead to improvements in learning. Research and development is<br />
needed to develop and test alternative methods for training electrical and electronic<br />
theory, with the goal of reducing both attrition and setback rates by a minimum of 25%.<br />
This paper discusses Navy basic electricity and electronics theory training (BETI’)<br />
with some suggestions for development of future training programs. It begins by briefly<br />
reviewing the history of Navy BETT training followed by a discussion of alternative<br />
approaches to this training that have been tried. Finally, several options for training<br />
improvements are presented.<br />
BETT History<br />
Through the 1950s and 6Os, Navy electronics training was both theory and math<br />
intensive. Well qualified trainees were amply available, thanks in part to the draft. “A”<br />
School electronics courses, often eight months long, challenged the trainees and also<br />
prepared them for the rigors of the “B” schools. “B” schools of up to fifteen months were<br />
available to qualified re-enlistees. These schools resembled university engineering<br />
programs.<br />
Perhaps it was inevitable that two dozen or more schools around the country<br />
independently teaching similar content would generate pressure for consolidation. In the<br />
early 196Os, consolidations were carried out, and a common syllabus, based on Bureau of<br />
Personnel publications, was adopted at each of the major training centers.<br />
TWO factors which came into play in the 1960s and 70s resulted in major changes in<br />
Navy electronics training. First, the Programmed Instruction (PI) movement reached its<br />
Peak of popularity in the 60s. Use of this approach in the Navy was judged desirable,<br />
132
and a contractor (Westinghouse) was funded by the Bureau of Personnel to convert the<br />
basic or introductory portion of Electricity/Electronics courses into a self-teaching<br />
-format. The contractor’s charter was not to change the substance of the course, but rather<br />
to convert it into a different “delivery system.” With the assistance of a committee of<br />
E/E instructors from San Diego schools, the basic lectures of the BuPers syllabus were<br />
converted into narrative and PI materials, summaries were written, test items were<br />
inserted as progress checks, and module tests, midterms, and final exams were also<br />
prepared from existing test items. In 1968, a partially self-paced compromise version of<br />
several variations of instructor-taught basic electricity and electronics was offered.<br />
Second, in 1967, NPRDC (then Navy Personnel and Training Research Laboratory)<br />
and CNATECHTRA began work on a computer-managed instruction (CMI) system.<br />
One of the courses eventually put on the CM1 system was the Westinghouse conversion<br />
of the basic E/E course. With only minor modifications, it was on-line at NAS Memphis<br />
as a CM1 course in 1973.<br />
A major organizational change also influenced BE/E training. Following<br />
recommendations of a review board (the Cagle Board), control of technical training was<br />
moved from Bureau of Personnel in 1972, and vested in two new organizations, Chief of<br />
Naval Education and Training (CNET) and Chief of Naval Technical Training (CNTT),<br />
with the latter absorbing the functions of CNATECHTRA. These new organizations<br />
evaluated the CM1 course and concluded that this form of training could be effective and<br />
economical. A CNTT in-house group (MIISA) was created to improve and expand the<br />
CM1 software. Basic E/E was consolidated in four schools and incorporated into the<br />
CM1 system. In 1975, the Westinghouse version of the San Diego compromise of the<br />
BuPers version of Basic E/E was standardized throughout the Navy. Between 1975 and<br />
1984, while there were cosmetic changes to the course material, the only substantive<br />
modifications were to increase the CM1 system’s ability to output various summary<br />
reports, to eliminate some “nice-to-know” material, and to add some modules on newer<br />
technologies such as transistors and advanced circuitry. The system was effective in<br />
meeting its goal of graduating large numbers of students in a significantly reduced<br />
amount of training time.<br />
In late 1984, CNET made two major decisions regarding BETI’: (1) the courses<br />
will be converted from self-paced to group-paced instruction, and (2) the training will be<br />
integrated into the appropriate “A” schools (thus, BETI courses would disappear as<br />
separate entities). The conversions began in 1985 and the majority were completed by<br />
1989. The BETT courses were phased out. These conversions did not result in any<br />
major redesign of BET’I’ training. Instead the existing BE’IT materials were adapted and<br />
added as the initial phase to the existing “A” school courses. The decision to add BETT<br />
to ‘A’ school courses presented an opportunity to increase the job relevance of this<br />
training. However, the schools were unable to do this during the initial adaption phase<br />
because they did not have the resources for making major course revisions.<br />
In general, the problems with the current materials are: (1) there has not been a<br />
recent job or task analysis; the instruction on electricity was adopted from simplified<br />
older physics courses*, (2) there are opportunities for course improvements including<br />
*Physicists regard the content and sequence of BETT to be outmoded theoretically. “The study of<br />
electricity at rest - “electrostatics” - used to bulk large in elementary physics. It was all that was<br />
133<br />
.
instructional design, test items, and laboratory experience that should be explored, and<br />
(3) students do not seem to develop a good practical understanding of electricity and<br />
electronics. Attention to all of these issues could lead to lower attrition and setback rates,<br />
and improved transfer to job tasks.<br />
Practical Test<br />
A practical hands-on performance exam was developed by NPRDC to test the<br />
transfer ability of BETT curriculum to more relevant job situations. The BETT course<br />
objectives were evaluated to determine which would be the most common to the job a<br />
technician might do in the fleet. From this analysis five hands-on test questions were<br />
developed using real components, resistors, capacitors, conductors, and a flashlight<br />
(battery) which tested the trainee’s ability to recognize and identify electrical<br />
components, determine the components operating condition using a multimeter, and<br />
analyze its effect in an operating circuit. Since the BETT course materials focus on use<br />
the multimeter, the test assumed a similar focus. Prior to giving the test the school staff<br />
evaluated it and thought that their students would have little difficulty achieving a good<br />
score.<br />
The test was given at two Navy class A schools once in 1986 at the Avionics<br />
Technician class ‘A’ school in Memphis, and again in 1988 at the Electrician’s Mate<br />
class ‘A’ school in Great Lakes, Illinois.<br />
The Memphis Study.<br />
The hands-on test was given to determine whether students could apply knowledge<br />
and skills learned in BETT in practical situations. The data from the hands-on<br />
performance test show that these students performing at very low levels. The mean<br />
scores for this test were 61.3 (n=l05) of 104 possible points. The mean score was<br />
considerably lower than the end-of-course-curriculum test scores and would be<br />
considered below passing in most Navy schools.<br />
The Great Lakes Study.<br />
In June 1987 the Chief of Naval Education and Training (CNET) designated the<br />
Electrician’s Mate (EM) ‘A’ School at Great Lakes, Illinois a “model school.” The goal<br />
was “to apply the best techniques and instructional technologies available... so that we<br />
will have in place curriculum, technologies, and management techniques which reflect<br />
the very best we currently know about teaching and learning.” The Navy Personnel<br />
Research and Development Center (NPRDC) was asked to participate in the model<br />
school effort.<br />
Our first research effort was to evaluate EM ‘A’ school Phase I training, which is<br />
the basic electronics and electricity (BETT) portion of the course. The hands-on<br />
performance test was given to 44 trainee’s from the first two phases of the course. and<br />
23 trainee’s awaiting initial instruction. The objective was to determine if phase 1 and<br />
11 trainee’s could solve practical problems using the knowledge and skills taught in EM<br />
known of electricity two centuries ago, and tradition dies hard. It makes a poor beginning for<br />
modem electric circuits. Now you need some knowledge of it for atomic physics. How much you<br />
see and learn of this part of physics will depend on apparatus, weather, and instructor. On the<br />
whole the 1~~s the better.” From, Rogers, E.M. 1977, Physics for the inquiring mind. Princeton,<br />
NJ: Princeton University Press, page 533.<br />
..----.%-- L . . ., -.<br />
134<br />
_ .
‘A’ and what knowledge a typical trainee brings with him to the school.<br />
All subjects were given the six practical problems to solve. The mean score for the<br />
two trained groups was 44.6 out of 140 points possible. An additional test item<br />
developed by the EM school staff accounts for the increase in total possible points.<br />
The average trainee had difficulty measuring values in simple electrical devices<br />
using the multimeter. For the most part the trainee’s know how to use the multimeter.<br />
However, they have difficulty knowing when or where to use it. Further, even after<br />
completing Phase II training, most trainees are not able to accurately interpret meter<br />
readings to identify an open or short, which is fundamental to equipment maintenance.<br />
Trainees did much better recognizing the various electrical components than they<br />
did measuring them. However, less than 50 percent of the PI and PI1 trainee’s were able<br />
to identify a capacitor, and a significant number of PI trainees had problems identifying a<br />
battery, conductor, and resistor.’ After the second phase component identification<br />
improves significantly.<br />
Alternative Approaches<br />
Over the last 40 years a number of alternative approaches to teach BETI’ have been<br />
developed and tested. This section will summarize some of the more significant work in<br />
this area. While none of these projects specifically reports cost data, all of them report<br />
decreases in attrition and set back rate and some report decreases in course length. All of<br />
the decreases directly relate to cost savings.<br />
The first test was done in the Navy School of Electronics in 1949 (Johnson 1951).<br />
The course that was changed was the basic electricity and electronics for radio, sonar,<br />
and radar maintenance. The results were that the course was shortened from 26 to 18<br />
weeks, attrition as compared to a control group was reduced by 66% and the setback rate<br />
dropped significantly.<br />
The second, was the LIMIT project done by HumRRO in the late 1950s (Goffard,<br />
Heimstra, Beecroft & Oppenshaw 1960). It reorganized the three week basic electricity<br />
section of a field radio repair according to job-oriented training (JOT) principles. A<br />
comparison of conventional students with JOT students showed that the latter achieved<br />
higher test scores.<br />
The third was project REPAIR again done by HumRRO in the late 1950s and early<br />
1960s (Brown et. al. 1959, Shoemaker 1960). The course modified was the entire field<br />
radio repair course. Approximately 100 students completed the new field radioman’s<br />
class. When their performance was compared with that of graduates of the traditional<br />
class, it was found that they were “significantly superior ” to the traditional class in four<br />
of seven test administered--troubleshooting, test equipment, repair skills, and<br />
achievement. No improvement was noted on the alignment, manuals, and schematics<br />
tests. An interesting finding was that the experimental course graduates were superior to<br />
the standard course graduates on each of the 8 problems that made up the troubleshooting<br />
test. This is impressive since 3 of the problems involved equipment on which the<br />
experimental course students received only 4 hours of familiarization training, compared<br />
with 38 hours of training for each student in the standard course.<br />
The fourth project was X-ET which was done at NPRDC in the mid 1960s<br />
(Pickering & Anderson 1966, VanMatre & Steinemann 1966, Steinemann, Harrigan, &<br />
VanMatre 1967). An experimental electronics technician (X-ET) course was developed<br />
135<br />
. .
that differed from ongoing courses in that it accommodated students previously<br />
considered unqualified in terms of aptitude scores and education. Training was oriented<br />
towards job skills and minimized nonessential math.ematics and electronics theory. The<br />
results showed that the X-ETs were taught to required levels of proficiency in a<br />
substantially shorter time than in the conventional course. In follow up studies of job<br />
performance it was found that, in general, the X-ETs were performing their duties<br />
satisfactorily in comparison with a control group and on the basis of ratings by<br />
supervisors and peers. They were superior to control ETs in troubleshooting, even<br />
though they scored lower on paper-and-pencil tests of electronics knowledge.<br />
The fifth project, SUPPORT, applied JOT to the Army’s medical corpsman’s course<br />
(Ward, Fooks, Kern & McDonald 1970). (This was not a BETI’ revision.) The course<br />
was changed from a lectured based, theory oriented course to a more job oriented course<br />
where the content was organized so the relevance of each new topic was readily<br />
apparent. The evaluation revealed that JOT students performed better than<br />
conventionally trained corpsmen in 21 out of 26 tests, including both paper-and-pencil<br />
tests and extensive job-sample, simulated performance tests. In addition, JOT students<br />
were faster than conventional corpsmen in attending to serious battle field wounds.<br />
There was a project related to the SUPPORT project that was aimed at extending the JOT<br />
methods used in the corpsmen training to radio operator training (Goffard, Polden &<br />
Ward 1970). The findings were that the recycle rate for trainees was reduced by 30<br />
percent in comparison with the standard course, and attrition was reduced by about 50<br />
percent. These outcomes were achieved even though the JOT classes were 40 percent<br />
larger and contained twice as many mental category IV personnel as the standard course.<br />
The final project was APSTRAT (Weingarten, Hungerland, Brennan, & Allred<br />
1971). This project was specifically targeted for low aptitude personnel admitted under<br />
Project 100,000. The findings were that the redesigned Army field wireman’s course had<br />
35 percent less attrition and that set back rates were cut from 30 percent to zero.<br />
Future Direction for BETT Training Development<br />
Based on the findings of the above studies and the results of the recent NPRDC<br />
hands-on practical tests, below are two alternative approaches to BETT training that<br />
could be used to make future Navy technicians better equipped to maintain the<br />
sophisticated weapons systems in tomorrows Navy.<br />
Develop a job oriented BETT course that is generic to all electrical schools. The<br />
basic electricity front-end that has been added to the ‘A’ schools would be converted<br />
from an abstract, mathematics and physics knowledge oriented course to one where jobrelevant<br />
skills are practiced in a situation of actual use. The current ‘A’ school phases<br />
would remain the same. The new front-end training would build on the trainee’s<br />
knowledge of familiar electrical devices to teach basic electrical operation and<br />
maintenance concepts. The knowledge acquired using these devices should transfer to<br />
the equipment used in the later phases of ‘A’ school and on-the-job. The new job<br />
oriented training would stress hands-on trainee performance. Hands-on experience would<br />
increase from the current twenty-five percent to sixty percent or more of total class time.<br />
The training would be developed, implemented and evaluated at one electrical school to<br />
determine the feasibility of implementing it in other electrical schools.<br />
136
Develop job oriented electricity theory training using equipment and tasks specific<br />
to each ‘A’ school. The basic electricity theory training would be integrated into the ‘A’<br />
school equipment operation and maintenance lessons. There would be no separate frontend<br />
theory training. The trainee would learn basic electrical operation and maintenance<br />
concepts on the equipment used on-the-job in the fleet or on reasonable simulations. The<br />
training would be sequenced so the easier/more familiar devices would be taught first,<br />
with more difficult concepts and techniques being taught with the more complicated<br />
devices. For example, for initial basic theory training, a radio receiver at ET ‘A’ school,<br />
or the small boat lighting system at EM ‘A’ school could be used to teach students basic<br />
circuit operation, preventive, and corrective maintenance methods. Those simple devices<br />
could be followed with more complicated devices that have more advanced concepts. ks<br />
in the generic option the emphasis would be placed on trainees learning hands-on<br />
practical skills. Laboratory time would increase to allow sufficient time for the trainees<br />
to become skilled in the application of the theories and concepts learned.<br />
References<br />
Brown, G., Zaynor, W., Bernstein, A., & Shoemaker, H. (1959). Development and evaluation of an<br />
improvedfreld radio repair course. HumRRO-TR-58-59. Alexandria, VA: Human Resources<br />
Research Organization.<br />
Goffard. S., Heimstra, N., Beecroft, R., & Opcnshaw. J. (1960). Basic electronicsfor minimally<br />
qualified men: An experimental evaluation of a method of presentation. HumRRO-TR-61-60.<br />
Alexandria, VA: Human Resources Research Organization.<br />
Goffard. S., Poldcn, D., & Ward. J. (1970). Development and evaluation ofan improved radio<br />
operator course (MOS 05&X7). HumRRO-TR-70-8. Alexandria, VA: Human Resources<br />
Research Organization.<br />
Johnson, H. (1951). The development of more effective methods of training electronics technicians.<br />
Washington, DC: Working Group on Human Behavior Under Conditions of <strong>Military</strong> Service,<br />
Research and Development Board, Department of Defense.<br />
Pickering, E., & Anderson, A. (1966). A performance-oriented electronis technician training program:<br />
I. Course development and evaluation. STB 67-2. San Diego: U.S. Naval Personnel Research<br />
Activity<br />
Shoemaker, H, (1960). The functiona context method of instruction. IRE Transactions on Education,<br />
Vol. E-3, no. 2, June 1960,52-57.<br />
Steinemann, J., Harrigan, R., & VanMatre, N. (1967). A performance-oriented electronics training<br />
program: IV. Fleetfollowup evaluation of graduates of all classes. SRR-68-10. San Diego: U.S.<br />
Naval Personnel Research Activity.<br />
VanMatre, N., & Steinemann, J. (1966). A performance-oriented electronics technician training<br />
program: II: Initialfleet follow-up evaluation of graduates. STB-67-15. San Diego: U.S. Naval<br />
Personnel Research Activity.<br />
Ward, I., Fooks, N., Kern, R., & McDonald, R. (1970). Development and evaluafion of an integrated<br />
basic combatladvanced individual training program for medical corpsman (MOS PlAlO).<br />
HumRRO-TR-70-l. Alexandria, VA: Human Resources Research Organization.<br />
Weingarten, K., Hungerland, J., Brennan, M., & Allred B., (1971). The APSTRAT instruction model.<br />
HumRRO PP-6-71, Alexandria, VA: Human Resources Research Organization.<br />
137<br />
.
USING EVENT HISTORY TECHNIQUES TO ANALYZE TASK PERISHABILITY:<br />
A SIMULATION<br />
Spnley~ D. Stephenson<br />
Southwest Texas State University<br />
Julia A. Stephenson<br />
University of North Texas<br />
Until now, task perishability, the point in time at which a<br />
~ task drops out of an airmanls inventory of tasks performed, has<br />
not been researched. This lack of research could be for two reasons.<br />
First, task performed/not performed is usually of more<br />
interest than is when a task leaves a job inventory. Second,<br />
perhaps measurement techniques for determining task perishability<br />
are either unavailable or unknown. In any event, little is known<br />
about task perishability. . _ .<br />
For a variety of reasons, knowledge about task perishability<br />
would be useful, primarily in training. For instance, the decision<br />
about when and where to train a task (formal school or OJT)<br />
could depend on how long the task is going to be used. Perhaps<br />
the most obvious use of task perishability would be in crosstraining.<br />
If a task can be determined to have a relative short<br />
residual life, perhaps training on that task is not necessary,<br />
even though the task is currently in the job inventory of comparable<br />
time-in-grade airman. Also, task perishability is<br />
obviously related to skill decay. A skill can be retained long<br />
after a task stops being performed. However, if a task perishes<br />
from an airman's inventory of tasks performed, the corresponding<br />
skill will eventually leave that airman's inventory of skills.<br />
Before skill decay can be measured, information about task perishability<br />
should be known.<br />
This paper will study the feasibility of measuring task perishability<br />
using a technique called Event History Analysis.<br />
There are two major features of event history as it applies to<br />
task perishability. First, it incorporates time in the analysis;<br />
;,zoAdhow long did a fa.sk stay in an airman's job inventory?<br />
It has the ability to handle censored data. Censored<br />
data i; data on which you have only partial information. For<br />
example, not all airman complete their first term enlistment. Of<br />
those who leave early, some will have stopped doing a task, but<br />
some will still be performing the task. Consequently, information<br />
about when the task would have left the censored airmen's<br />
job inventories if they had stayed in the Air Force is unknown:<br />
however, that the task lllivedll until the point of censoring is<br />
known. Rather than discarding these censored data, event history<br />
incorporates the available information and, although incomplete,<br />
produces more precise estimates of task survivability.<br />
TO study task perishability with event history techniques,<br />
this paper used a simulated data base of a type which could be<br />
derived from the data produced by the USAF Occupational Survey<br />
Program and other sources.<br />
USAF Occupational Survey Program<br />
The USAF job analysis program is frequently referred to by<br />
the term, CODAP (or TI/CODAP) (Christal & Weissmuller, 1988).<br />
CODAP usually involves taking a snapshot of the entire work force<br />
at one point in time; i.e., rather than being longitudinal, the<br />
data collected is vertical. Consequently, the data do not provide<br />
information about what an individual airman does over a 20<br />
138
year career. Instead, CODAP provides information about what all<br />
airmen are doing in groups such as first term, second term , or<br />
career enlistees. Also, the survey is administered.to essentially<br />
100 percent of the career field and produces a response<br />
rate of over 80 percent.<br />
Event History Analysis<br />
Event history analysis enables the researcher to determine<br />
probabilities associated with the length of time for a binary,<br />
dependent variable to change states. Another requirement is<br />
knowledge of the time from the start of the experiment to the<br />
change in state of the dependent variable. Both the origin time<br />
and the exact point at which the dependent variable changes must<br />
be precisely defined. Also, the length of time must always be a<br />
positive value. The last assumption is that the sample should be<br />
homogeneous (Cox & Oakes, 1984).<br />
One of the strengths of event history analysis is the ability<br />
to include some information concerning censored data. An item is<br />
considered to be censored if it is removed from the sample before<br />
the experiment is terminated and the dependent variable has not<br />
changed states. A second type of censoring occurs if the experiment<br />
ends before the dependent variable changes. In most parametric<br />
statistical analyses, such data would have to be omitted<br />
from the sample. However, the fact that the item had not changed<br />
at the point of leaving or ending the experiment can provide some<br />
relevant information that should be incorporated into probabilities<br />
associated with the time at which the dependent variable<br />
changes states.<br />
Several probabilities are associated with event history analysis.<br />
The failure and the survival functions represent cumulative<br />
distributions about when the dependent variable changes<br />
states. Failure is defined as the change in the dependent variable;<br />
survival is the lack of change. The hazard function represents<br />
the conditional probability that the dependent variable<br />
will change states in a specific time period, given that it had<br />
not changed states in the previous period (Kalbfleisch & Prentice,<br />
1980). The mean life residual function represents the<br />
average length that the dependent variable will survive beyond<br />
the specified time period (Oakes & Desu, 1990).<br />
All of these functions are related mathematically. E.g.,<br />
once the survival curve is estimated, the mean life residual can<br />
be determined. The most widely used method for computing the<br />
survival function is the product limit estimator proposed by<br />
Kaplan and Meier (1958).<br />
Method<br />
At first glance, event history analysis does not seem appropriate<br />
for examining Air Force occupational data. One problem is<br />
that the Air Force maintains little data on persons who leave the<br />
service. Also, the actual time that a person stops doing a specific<br />
task is not recorded. However, data gathered by occupational<br />
surveys do meet the required assumptions.<br />
Event history requires that the dependent variable be<br />
binary. For task perishability, this translates to whether or<br />
not a task is being performed. In an occupational survey,<br />
respondents check if they are performing a task; thus, task performance<br />
is known.<br />
The second assumption of event history is that the origin and<br />
139
___r-.--r--...<br />
.-.--.- --<br />
I- _ - _--._ ..-. .~<br />
exact point at which a job leaves a person's inventory must be<br />
specified. Actually, however, the only information that is necessary<br />
is the length of time that a person holds a specific task<br />
in the job inventory. To meet this requirement, a small mental<br />
transformation of the data is necessary. Occupational surveys<br />
provide information on the percent members performing a task in<br />
each time interval. The difference in percent member performing<br />
over two intervals is in essence a measure of those who have<br />
stopped doing a task. Therefore, occupational survey data meets<br />
the two primary assumptions of event history analysis. However,<br />
a problem is the inclusion of the censored data. While the Air<br />
Force does have information regarding AFSC attrition rates,<br />
whether the specific task is in an airman's inventory when he<br />
leaves the service (attrites) is unknown.<br />
For the purposes of this study, we generated a 1000 person<br />
data base. This data base included actual data points for a task<br />
leaving an airman's job inventory, as well as censored data,<br />
which simulated those airmen who leave the Air Force prior to the<br />
task leaving their inventory. While this model is not specific<br />
to any career field, it does incorporate several facts which are<br />
intrinsic to the job/career development in the Air Force. For<br />
instance, many airmen spend up to 12 months in training before<br />
actually being assigned to a work place. Thus, this model starts<br />
simulating at the thirteenth month, which is actually the first<br />
point in time that a task could leave an incumbent's inventory.<br />
Another consideration is the large change in status at the 48th<br />
month. At this point many airmen leave the service: of those who<br />
do continue in the Air Force, some change career fields. This<br />
change results in many censored data points at the 48th month.<br />
In summary, single task performance data for an initial set<br />
of 1000 airman was simulated over a 6 year (72 month) period.<br />
Using the type of data available from Occupational Survey<br />
Reports, percent members performing for each month interval were<br />
created. Censoring was also generated for this simulation.<br />
Although exact censoring data cannot be determined from current<br />
Air Force data bases, historical attrition data are available.<br />
The censored data values can then be estimated from the attrition<br />
data using the information from current percent members performing<br />
a single task. A total of 300 (30%) censored data points<br />
were inserted into the data base using a random number procedure.<br />
From this simulated data base, three functions were calculated:<br />
the Survival function, the Hazard function, and the Mean<br />
Life Residual function. All calculations were performed using<br />
the Lifetest procedure in SAS. Examples of the survival and the<br />
mean‘life residual functions are given in this paper.<br />
Results<br />
Figure 1 shows the survival function for the simulated data<br />
base. It represents the probability of an airman at a specific<br />
time period performing the task. For example, at the 36th month,<br />
the probability of an airman still performing this task is .54.<br />
Figure 2 represents the mean life residual function for the<br />
simulated data base. This function can be interpreted as the<br />
average length that an airman will be performing the task beyond<br />
a specific time period. At the 36th month, on average, an airman<br />
will be performing this task 13.8 more months.<br />
140<br />
.
2E<br />
24<br />
20<br />
I@<br />
12<br />
8<br />
4<br />
Figure 4<br />
Mean Life Comparison<br />
monlna .- --..- - _._ ___ _.____ . _ _. _ __ _ -.-.- .-. -----<br />
0<br />
12 18 20 24 28 32 38 40 44 48 52 68 to 84 T’<br />
months<br />
--- Inalude censors --O 0m1 iill cansors<br />
Figures 3 and 4 show a'comparison between the data base with<br />
all 1000 airmen (event history analysis) and the data base with<br />
700 airmen (i.e., all censored data omitted). The difference in<br />
the two survival functions (figure 3) is greatest at the 48th<br />
month, the point at which censoring is heaviest.<br />
The difference between the two mean life residual functions<br />
(Figure 4) is greatest at the beginning of the 13th month, basically<br />
because excluding the 300 censored data points removes some<br />
information about how long a task is performed. At the 48th<br />
month the two curves become very similar. Thus, censoring after<br />
the first term has less of an effect on the mean life residual<br />
function.<br />
This data could also be presented in a table format. A portion<br />
of these functions is shown in Table 1.<br />
Month<br />
8;<br />
38<br />
39<br />
40<br />
Table 1<br />
Comparison Data<br />
SUrVlval Function Mean Life ReddUal<br />
lndude Omit include cmt<br />
Censora Coneore Censor8 ceneora<br />
A44 -377 13.307 9.273<br />
-627 -364 13.264 8.871<br />
A16 .a40 12.647 8.2U<br />
.496 .a13 12.078 7.969<br />
A79 .293 11.472 7.602<br />
Discussion<br />
The results of this study show that event history analysis<br />
can be used to investigate task perishability. Due to the method<br />
of collecting task data in the Air Force's Occupational Survey<br />
Program, accurate figures can be obtained for the change in state<br />
of the binary variable, e.g., task perishability. Historical<br />
attrition data are available for all career fields. Thus, censoring<br />
is the only unknown variable, and it can be accurately<br />
estimated by combining occupational and attrition data. Therefore,<br />
an appropriate data base can be created for any AFSC.<br />
The results of the analysis also show the advantage of using<br />
event history to analyze task perishability. Figures 3 and 4<br />
vividly illustrate the difference in analyzing task perishability<br />
141<br />
. .
,a,:<br />
t<br />
/ ;<br />
Figure 1<br />
Survival Function<br />
I-r
using event history analysis, which can accommodate censored<br />
data, and using conventional analytical procedures, which essentially<br />
discard censored data. Estimations of both the survival<br />
and mean residual life functions are more accurate using event<br />
history analysis. Therefore, the results of this study strongly<br />
suggest that analyzing task perishability with event history<br />
techniques should continue to be studied.<br />
The use of event history analysis to examine occupational<br />
data, such as task perishability, is a new application of this<br />
statistic. Thus, several research issues need further examination.<br />
Of primary concern is the inclusion of censored data. As<br />
mentioned earlier, the Air Force does not maintain records of the<br />
tasks performed by persons who attrite. Therefore, determining<br />
the number of censored data points at each interval will have to<br />
be modeled. A logical start point would be to use the known<br />
information on percent (of those who complete the occupational<br />
survey) members performing as an estimation of the percentage of<br />
those who attrited but still held the task in their inventories.<br />
The assumption that 100% of the airmen are performing the<br />
task at the start of the career field raises a potential theoretical<br />
issue. However, the math underlying the model is primarily<br />
based on conditional probabilities, thus deviating from this<br />
assumption would not seem to have a severe effect on the task<br />
performance probabilities. Another theoretical question concerns<br />
the homogeneity of the persons in a particular career field. A<br />
more accurate analysis of when a task leaves an airman's job<br />
inventory may be accomplished by sub-grouping the career field<br />
with a covariate such as present grade, skill-level, or gender.<br />
Also, some tasks may perish more rapidly for airmen who are in<br />
their second career field. These and other theoretical issues<br />
need to be researched.<br />
An area of interest for further research is task emergence.<br />
The model set forward in this study could easily be restructured<br />
to analyze when a task enters a job inventory. A strength of<br />
this type of analysis is that it would provide information on a<br />
continuum, by month, instead of chunking by 1st term, 2nd term,<br />
etc. Perhaps task emergence and task perishment could be linked<br />
to provide more information on when and by whom a task, or group<br />
of tasks, is performed in a career field.<br />
References<br />
Christal, R. E., & Weissmuller, J. J. (1988). Job-task inventory<br />
analysis. In S. Gael (Ed.) The job analysis handbook for business,<br />
industry, and government (Vol II), pp. 1036-1050. New<br />
York: Wiley.<br />
cox, D. R. & Oakes, D. (1984). Analysis of survival data. New<br />
York: Chapman and Hall.<br />
Kalbfleisch, J. D. C Prentice, R. L. (1980) The statistical analysis<br />
of failure time data. New York: Wiley.<br />
Kaplan, E. L., & Meier, P. (1958). Nonparametric estimation from<br />
incomplete observations. American Statistical <strong>Association</strong> Jour-<br />
@, 53, 457-481.<br />
Oakes, D. & Dasu, T. (1990) A note on residual life. Biometrika,<br />
77, 409-410.<br />
143<br />
-.
-__- ..___ -..-<br />
A FIRST LOOK AT THE EFFECT OF INSTRUCTOR BEHAVIOR<br />
,IN A COMPUTER-BASED TRAINING ENVIROrJMENT<br />
STANLEY D. STEPHENSON<br />
SOUTHWEST TEXAS STATE UNIVERSITY<br />
Computer-based Training (CBT) research has typically focused<br />
on comparing a CBT course with a corresponding traditional<br />
instruction (TI) course. Compared to a similar TI course, CBT<br />
generally, but not always, produces increases in learning and<br />
retention while concurrently requiring less time than TI<br />
(Fletcher & Rockway, 1986; Goodwin et al., 1986; Kuibte;eulik,<br />
1986, 1987; McCombs, et al., 1984; O'Neil, 1986).<br />
, CBT<br />
results have not always been positive; there are many instances<br />
in which CBT did not produce increases in performance or<br />
decreases in learning time (Goodwin et al., 1986; McCombs et al.,<br />
1984). In general there has been very little research on maximizing<br />
performance within a CBT system (Gillingham & Guthrie, 1987).<br />
Conversely, there is a long history of research on variables<br />
which influence achievement in TI systems. One of the most<br />
researched variables is instructor behavior. TI research has<br />
produced a relatively high degree of consensus as to what an<br />
effective instructor does versus what a not-so-effective instructor<br />
does, with effective being defined in terms of academic<br />
achievement (Brophy, 1986; Brophy & Good, 1986; Rosenshine,<br />
'1983). Yet, CBT research has neglected the role of the instructor<br />
(Moore, 1988). Therefore, little is known about whether or<br />
not TI instructor variables transfer to CBT.<br />
In one of the few studies which did examine the role of the<br />
CBT instructor, Moore (1988) found that students who had positive<br />
teachers scored significantly higher than those in classes with<br />
negative teacher. McCombs et al (1984) reviewed various early<br />
CBT courses and found that two factors were critical to the success<br />
of the CBT courses. These were: (a) adequate opportunities<br />
for student-instructor interactions, and (b) the incorporation of<br />
group activities with individualized training. McCombs (1985)<br />
reviewed the role of the instructor in CBT from a theoretical<br />
perspective and developed several practical suggestions for<br />
instructor use. One of her suggestions was that "...instructors<br />
must have meaningful roles in the management and facilitation of<br />
active student learning, if the CBT system is to be maximally<br />
effective" (p. 164).<br />
As noted from McCombs et al (1984), student-instructor interaction<br />
was a critical factor with regard to success of a CBT system.<br />
This is a significant finding since one of the most consistently<br />
reported positive TI instructor behaviors is frequent but<br />
short student-instructor interactions; i.e., an increase in student-instructor<br />
interactions produces an increase in achievement<br />
(Brophy, 1986; Brophy & Good, 1986; Rosenshine, 1983). Therefore,<br />
a TI instructor behavior which may transfer to CBT is student-instructor<br />
interaction.<br />
The purposes of this study were two-fold. First, this study<br />
was an attempt to begin to explore the effect of instructor<br />
behavior on achievement in CBT. Second, this study specifically<br />
examined the effect of student-instructor interaction in CBT.<br />
Based on the TI instructor literature, it was hypothesized that<br />
144<br />
.
increased student-instructor interaction would produce increased<br />
achievement.<br />
Method<br />
Subjects<br />
Subjects were 25 (15 female and 10 male) college juniors and<br />
seniors enrolled in a Business Statistics class. As part of a<br />
project designed to teach students how to use computer spreadsheet<br />
software to perform statistical computations, Ss volunteered<br />
to participate in a spreadsheet tutorial for extra credit.<br />
The extra credit was awarded for project completion, not for<br />
project performance. All Ss completed a survey to assess their<br />
personal computer (PC) literacy.<br />
Experimental Materials<br />
The spreadsheet tutorial was part of a larger commercial<br />
software tutorial package designed for an integrated spreadsheetword<br />
processing-database program. The tutorial is basically linear<br />
and learner-controlled; however, Ss did have the capability<br />
to repeat a lesson if desired.<br />
For the purposes of this study, the larger tutorial was modified<br />
to include just the introduction to the integrated package<br />
plus that portion of the tutorial software devoted to the use of<br />
the spreadsheet. The introduction portion (Part A) contained<br />
four lessons, and the spreadsheet portion (Part B) contained<br />
eight lessons. The tutorials were run on Tandy 1OOOSX PCs.<br />
An exercise designed to evaluate mastery of the spreadsheet<br />
tutorial commands was added to the experimental software. Since<br />
the students were volunteers from a Business Statistics class,<br />
the exercise used simple statistical concepts as the vehicle for<br />
evaluating spreadsheet mastery. Consequently, the experimental<br />
material consisted of a CBT spreadsheet tutorial modified to<br />
include a statistics-based exercise.<br />
Procedure<br />
Ss were randomly assigned by spreadsheet/PC literacy to one<br />
of two student-instructor interaction modes. Group I (n=13) had<br />
essentially no instructor-initiated interactions. All Group I<br />
interactions were initiated by the student and consisted of<br />
requests by the students for help in overcoming an obstacle in<br />
the tutorial. Group II (n=12) experienced the same type of student-initiated<br />
interactions experienced by Group I. In addition<br />
Group II was exposed to multiple instructor-initiated interactions.<br />
Both groups worked the CBT tutorial in three sessions. In<br />
session one, all Ss started on lesson 1A and worked in the tutorial<br />
for 90 minutes. In the second session, all Ss started on<br />
lesson Bl and worked through the last lesson, B8. In the third<br />
session, all Ss started on lesson B3 and again worked through the<br />
last lesson, lesson B8. Consequently, all Ss had a single exposure<br />
to lessons Al though A4 and repeated exposure to lessons Bl<br />
through B8. Also, since each S went at his/her own speed, Ss'<br />
total time on task varied. At the completion of lesson B8 on day<br />
3, all ss were given an exercise designed to evaluate their mastery<br />
of the tutorial material. Ss had 30 minutes to work on the<br />
exercise.<br />
During the startup period of the project (i.e., the first 15<br />
minutes of the first session), the instructor responded to all<br />
145
_.. . __-..-_- ._.. ~. _ _,<br />
questions in both groups to insure that the Ss were properly<br />
logged into the tutorial. For both groups, the instructor also<br />
responded to all student-initiated interactions with one or more<br />
of three responses: (1) "Try pushing the [ESCAPE] key;" (2) "Try<br />
pushing the [SPACE] bar;" or (3) "Re-boot the system and start<br />
over." These suggestions were given in sequence; e.g., if "Try<br />
pushing the [ESCAPE] key," did not work, then the S was told to<br />
"Try pushing the [SPACE] bar." For Group I Ss, these suggestions<br />
were the only instructor interactions experienced after the first<br />
15 minutes of session one. In a sense, Group I's instructor performed<br />
an impersonal, course administrator role.<br />
In addition to the interactions listed above, Group II Ss<br />
also experienced instructor-initiated interactions. In the first ..<br />
session, the instructor initiated four interactions with each S.<br />
In session two and three, the instructor initiated three and one<br />
interactions, respectively. These interactions were related to<br />
location of keys on the Tandy keyboard. E.g., shortly before<br />
needing to use the Back Slash (\) key, the instructor would tell<br />
the students where that key was located. Key location was<br />
explained and diagrammed in instructions given to all Ss, but for<br />
most students key location on the Tandy keyboard was a minor<br />
problem due to previous exposure to an IBM keyboard. Instructorinitiated<br />
interactions lasted between 5 and 10 seconds.<br />
It should be noted that in no instance did the instructor<br />
provide information which was not available to the student elsewhere.<br />
Also, in no instance did the instructor comment, provide<br />
feedback, or give praise on the student's performance on the<br />
tutorial.<br />
Dependent Measures<br />
Two dependent measures were recorded. First, the Ss' performance<br />
on the exercise were scored. Second, Ss also recorded<br />
which spreadsheet commands they used. Since most procedures can<br />
be performed in more than one way (e.g., a cell entry can be<br />
changed via an EDIT command or by simply re-typing the entry),<br />
this second measure was recorded to assess how many different<br />
spreadsheet commands were actually used during the exercise.<br />
Results<br />
Means and standard deviations for Spreadsheet Performance and<br />
Use of Spreadsheet Commands are given in Table 1. Due to the<br />
small sample sizes (and possible problems with the assumption of<br />
normality), the Mann-Whitney U non-parametric test statistic was<br />
used to analyze differences between Group I (no instructorinitiated<br />
interaction) Ss and Group II (instructor-initiated<br />
interaction) ss.<br />
Table 1<br />
Spreadsheet Performance and Use of Spreadsheet Commands<br />
Means and Standard Deviations<br />
Spreadsheet Performance<br />
Mean SD<br />
Group I (No Interaction) 58.000 18.257<br />
Group II (Interaction) 72.417 7.403<br />
Use of Spreadsheet Commands<br />
Mean SD<br />
Group I (No Interaction) 32.308 7.250<br />
Group II (Interaction) 30.833 8.483<br />
146
Exercise Performance<br />
Group II (instructor-initiated interaction) Ss significantly<br />
out performed Group I (no instructor-initiated interaction) Ss<br />
(Mann-Whitney U = 34.50, p < .017).<br />
Use of Spreadsheet Commands<br />
There was no difference in command usage between Group I Ss<br />
and Group II Ss; (Mann-Whitney U = 82.00, p < .824).<br />
Sex Differences<br />
Sex differences were not significant (for Spreadsheet Performance,<br />
Mann-Whitney U = 56.00, p < .289; for Use of Spreadsheet<br />
Commands, Mann-Whitney U = 69.50, p < .755).<br />
Discussion<br />
The hypothesis that increased student-instructor interaction<br />
would lead to increased achievement was supported. Given the<br />
limited length of the CBT program used in this experiment, the<br />
degree of difference of increased achievement between the two<br />
groups was surprising. For some reason, having the instructor<br />
interact with/take notice of/care about the student affected the<br />
student to the point where it increased his/her achievement.<br />
The underlying cause for the difference in achievement did not<br />
seem to be knowledge. All Ss seemed to "learn" the commands presented<br />
in the tutorial; there was no difference between groups in<br />
the number of commands used to solve the exercise. The difference<br />
was in how well the commands were used.<br />
Nor was the difference in achievement due to praise or<br />
feedback. Neither group received praise for their performance.<br />
Unless relatively brief human interaction is defined as praise,<br />
praise was not a factor in this study. Extra credit for higher<br />
performance on the exercise also was not a factor; all Ss<br />
received the same amount of extra credit regardless of their performance.<br />
A clue as to why Group I Ss did not perform as well as Group<br />
II Ss comes from observations made by the Group instructor. It<br />
seemed that Group I Ss used the space bar more frequently than<br />
did Group II Ss. In this study's tutorial, Ss had the capability<br />
to literally space-bar their way through the tutorial. I.e.,<br />
rather than actually performing the requested tutorial action, Ss<br />
could depress the space-bar and step through the program.<br />
Although not measured, Group I Ss seemed to take this approach<br />
more frequently. Consequently, while both groups were equally<br />
exposed to the material, Group II Ss seem to actually perform the<br />
tutorial more. This -space-bar' behavior could account for the<br />
difference in achievement. The difference in standard deviation<br />
between the two groups could also be a result of the reduced<br />
practice by Group I Ss.<br />
If the explanation offered above is accurate, it suggests<br />
that brief human interaction serves to keep students on task more<br />
so than no human interaction. If no one is aware of what I am<br />
doing, I am more likely to try to ease my way through the CBT<br />
course. However, if someone is aware of what I am doing, irrespective<br />
of whether or not that someone gives me praise or feedback,<br />
then I had best stay on task.<br />
Due to the manner in which the Group II interactions<br />
occurred, instructor monitoring of the students was confounded<br />
with interaction. For the instructor to know when to interact<br />
147<br />
. .<br />
f
--.--- ---<br />
. -. ._ . .-. ._. __-_ _.<br />
with an appropriate comment, the instructor had to know when a<br />
student was approaching a particular point in the tutorial. In<br />
order to know this, the instructor had to constantly monitor the<br />
students' progress. Consequently, while the Group I instructor<br />
sat at a desk and waited for students to request assistance, the<br />
Group II instructor was constantly walking around the room and<br />
visually checking on where Ss were in the tutorial. Therefore,<br />
it may be that monitoring, and not interaction, was the basis for<br />
Group II's higher achievement.<br />
These results add to the results reported by Moore (1988) who<br />
found that, even in CBT, teachers with positive attitudes produced<br />
higher achievement than teachers with negative attitudes.<br />
Evidently, instructor interaction can also affect achievement.‘<br />
Whether or not the interaction needs to be tied to course content<br />
is unknown. In this study, the instructor's comments were not<br />
content-based. Therefore', it may be CBT instructors should<br />
interact with students in order to maximize achievement, but that<br />
the interactions may not need to be related to the material being<br />
covered.<br />
Implications<br />
The relatively short-term nature of the tutorial used in this<br />
experiment obviously limits the generalization of this study's<br />
results. That limitation not withstanding, the specific conclusion<br />
from this study is that brief instructor-initiated interactions<br />
can increase achievement in CBT. However, instructor monitoring<br />
without interaction may produce the same result.<br />
Since the role of the instructor in CBT is frequently undefined,<br />
the results from this study give some direction as to what<br />
a CBT instructor can do to influence achievement. Moreover,<br />
since instructor-initiated interactions are controlled by the<br />
instructor, these interactions can be both built into the larger<br />
learning system (which includes the CBT subsystem) and also<br />
included in the instructor evaluation system.<br />
A larger implication from this study is that instructor<br />
behavior does seem to influence achievement in CBT. The results<br />
obviously support Moore's research (1988) and McCombs (1985) suggestions.<br />
There is simply something about having another human<br />
around and aware of your actions that alters your behavior. Even<br />
in the best designed, best built, and best implemented CBT systems,<br />
instructor behavior may still influence achievement.<br />
Rather than trying to design a CBT system which does away with<br />
the instructor (or to design a system which essentially ignores<br />
the instructor), CBT developers should try to find ways in which<br />
to use instructor presence to maximize achievement.<br />
References<br />
Brophy, J. E. (1986). Teacher influences on student achievement.<br />
American Psycholoqist, October, 1069-1077.<br />
Brophy, J. E. & Good, T. L. (1986). Teacher behavior and<br />
student achievement.- In M. c. Wittrock (Ed.), Third Handbook<br />
of research on teaching: 328-375. New York: Macmillian.<br />
Fletcher, J. D., & Rockway, M. R. (1986). computer-based educa-<br />
tion in the military. In J. A. Ellis (Ed.), <strong>Military</strong> contribu-<br />
tions to instructional technoloqy<br />
Gillingham, M. G., & Guthrie, J. 1.<br />
148<br />
New York: Praeger.<br />
(1987). Relationships<br />
. .
etween CBT and research on teaching. Contemporary Educational<br />
Psycholoqy 12, 189-199.<br />
Goodwin, L. D., Ghodwin, W. L., Nansel, A., & Helms, C. P.<br />
(1986). Cognitive and affective effects of various types of<br />
microcomputer use by preschoolers. American Educational<br />
Research Journal, 23, 348-356.<br />
Kulik, C. C., & Kulik, J. A. (1986). Effectiveness of computerbased<br />
education in colleges. AEDS Journal, Winter/Spring,<br />
81-108.<br />
Kulik, J. A., & Kulik, C. C. (1987). Review of recent<br />
research literature on computer-based instruction. Contemporary<br />
Educational Psycholoqy 12, 222-230.<br />
McCombs, B. L. (1985). Instruhtor and group process roles in .<br />
computer-based training. Educational Communication and Technology<br />
Journal, 33, 159-167.<br />
McCombs, B. L., Back, S. M., & West, A. S. (1984). Self-paced<br />
instruction: Factors critical to implementation in Air Force<br />
technical training - A preliminary inquiry. (AFHRL-TP-84-23).<br />
Lowery Air Force, Base, CO: Air Force Human Resources Laboratory,<br />
Training Systems Division.<br />
Moore, B..M. (1988). Achievement in basic math skills for low<br />
performing students: A study of teachers' affect and CAI. -The<br />
Journal of Experimental Education, 5, 38-44.<br />
O'Neil, H. F., Anderson, C. L., & Freeman, J. A. (1986).<br />
Research in teachina in the Armed Forces. In M. C. Wittrock<br />
(Ed.), Third handbook of research on teaching: 971-987. New<br />
York: Macmillian.'<br />
Rosenshine, B. (1983). Teaching functions in instructional<br />
programs. The Elementary School Journal, 83, 335-351.<br />
149
Transfer of Training with Networked Simulators'<br />
David W. Bessemer<br />
U.S. Army Research Institute<br />
Field Unit-Fort Knox, Fort Knox, Kentucky<br />
The Armor Officer Basic (AOB) Course in the Fort Knox Armor<br />
School includes three weeks of tactical instruction followed by<br />
ten days of Mounted Tactical Training (MTT) in the field. During<br />
MTT, students rotate among tank crew and unit leader positions as<br />
they perform platoon mission exercises. Late in 1988, two days<br />
of similar training in networked tank simulators were added just<br />
before the MTT. Additional platoon movement training using<br />
wheeled vehicles also began with the next class after simulator<br />
training started. These changes set up a quasi-experimental<br />
comparison between basel.ine classes that graduated before the<br />
changes and later classes that received added training. Student<br />
records provided performance measures in an interrupted timeseries<br />
design (Cook & Campbell, 1979) that permitted transfer<br />
from simulator training to field performance to be assessed.<br />
The simulator networking (SIMNET) system used for the AOB<br />
training was produced as a test-bed for Defense Advanced Research<br />
Projects Agency R &I D on technologies capable of large-scale<br />
interactive simulation of land combat. Training devices using<br />
these technologies could provide increased collective training<br />
for units, while reducing factors such as cost, time, and maneuver<br />
space that now restrict combined arms training. However,<br />
unit training in simulators must be shown to be effective to<br />
justify further development and acquisition of networked simulator<br />
training devices. Evidence supporting the effectiveness of<br />
SIMNET training for some platoon tasks was obtained in a test<br />
with a small number of units (Gound 61 Schwab, 1988). Results<br />
reported here supplement the previous findings by specifically<br />
examining officer training for platoon leadership. An important<br />
issue in interpreting the results was whether SIMNET training<br />
caused the observed effects or if other factors contributed, such<br />
as the added wheeled vehicle training.<br />
Samole<br />
Method<br />
One group of 1098 students were enrolled in 24 AOB classes<br />
graduating in a 68 week baseline period. Another group of 607<br />
students were from 12 later classes in a 33 week period after<br />
tactical training in SIMNET was added to the course. There were<br />
one to five student platoons in a class, adding up to 71 platoons<br />
'The views , opinions, and findings contained in this paper<br />
are those of the author, and should not be construed as the<br />
official position of the U.S. Army Research Institute or as an<br />
official Department of the Army position, policy, or decision.<br />
--.<br />
150<br />
_,
in baseline classes and 39 platoons in SIMNET-trained classes.<br />
Platoons were supervised by a group of 16 officer and senior<br />
noncommissioned officer (NCO) instructors, called Team Chiefs,<br />
each assisted by a team of NC0 tank crew instructors. Every<br />
platoon had one Team Chief guiding all of its tactical training.<br />
Eauioment<br />
SIMNET Traininq. Training was conducted in the Combined<br />
Arms Tactical Training Center (CATTC) at Fort Knox that houses<br />
the SIMNET system. AOB classes used four Ml tank modules per<br />
platoon with a terrain data base portraying the Fort Knox areas<br />
used for AOB,field training. Vehicle crews operate SIMNET<br />
modules interactively through a local area computer network (LAN)<br />
in a manner similar to real vehicles. Scenes shown in simulated<br />
sights and vision blocks respond to control inputs to create the<br />
illusion of moving and fighting on the battlefield. Crews can<br />
detect and shoot enemy vehicles, and communicate both within the<br />
crew and to other vehicles and organizations. Operating together I<br />
as a unit, crews can use many standard tactical techniques to<br />
execute a combat mission.<br />
Field Traininq. Each AOB student crew in SIMNET-trained<br />
classes (except for the first such class) used High Mobility<br />
Multi-Purpose Wheeled Vehicles (HMMWVs) for some MTT-like preparatory<br />
training on cavalry operations. All student crews used an<br />
M60A3 tank (U.S. Department of the Army, 1979) and basic issue<br />
items furnished with the tank during MTT.<br />
Trainina Procedure<br />
SIMNET Exercises. In the first day of simulator training,<br />
the students were introduced to the operation of SIMNET tank<br />
modules, and conducted a tactical road march mission as a tank<br />
company. Platoons.then practiced techniques of movement and<br />
battle drills, and performed a movement to contact mission<br />
against static unreactive target vehicles placed on the terrain.<br />
Two force-on-force (FOF) exercises were completed on the next<br />
day, with pairs of platoons alternating in offensive and defensive<br />
roles. For every exercise, the platoon Team Chief selected<br />
two students to act as platoon leader and platoon sergeant. The<br />
Team Chief gave these students a company-level mission order, and<br />
allowed them about an hour to plan and prepare the platoon<br />
mission. The Team Chief controlled the execution of the mission<br />
by acting in the role of company commander. After an exercise,<br />
the Team Chief led an after-action review (AAR) in which the<br />
platoon assembled to discuss strong and weak points exhibited in<br />
planning and executing the mission. After a FOF exercise, the<br />
opposing platoons met for a joint AAP.<br />
Field Exercises. Students platoons completed from two to<br />
four on-tank exercises per day during MTT. For several days the<br />
exercises were relatively elementary, gradually increasing in<br />
complexity and difficulty. Initially, the exercises consisted of<br />
151
oad marches and unopposed cross-country movement. Then there<br />
were several movement to contact and other simple offensive<br />
missions with light simulated enemy contact. Defensive exercises<br />
began the relatively advanced level of training. Complex offense<br />
and defense mission exercises were intermingled in the later days<br />
of the MTT. The student platoons were in the field continuously<br />
during the lo-day MTT training period. The students' positions<br />
in crews were rotated frequently, and new individuals were chosen<br />
to serve as platoon leader, platoon sergeant, and TCs after most<br />
exercises. Usually each student served once in both the platoon<br />
leader and platoon sergeant positions during the MTT in either<br />
order. The sequence of events in field exercises was like that<br />
used in SIMNET. The Team Chief gave a company mission order, and<br />
then the leader and sergeant planned, prepared, and executed the<br />
platoon mission under the command of the Team Chief.<br />
Measures<br />
Crew instructors rated performance of students acting as<br />
platoon leaders or platoon sergeants in the field exercises, with<br />
final review and approval of the ratings by the Team Chief.<br />
Elements of planning, movement and control, and conduct of the<br />
operation were rated on a three-point scale. More than 80% of<br />
the ratings were in the middle (average or satisfactory) category,<br />
showing a strong central bias. Ratings coded as 1, 0, and -1<br />
were averaged for 17 items, omitting items judged "not applicable"<br />
in a particular exercise, to form a field performance index<br />
ranging between +lOO, with zero set at the middle scale category.<br />
The number of ratings was also used to indicate the relative<br />
number of field exercises completed in a platoon. The number of<br />
ratings roughly corresponds to twice the number of exercises.<br />
Separate counts were made for the categories of elementary<br />
movement and contact exercises, and for advanced exercises.<br />
At course graduation, the crew instructors evaluated overall<br />
leadership qualities exhibited by the students during the platoon<br />
tactics phase of the AOB course. Team Chiefs also reviewed and<br />
approved these Comprehensive Student Evaluation ratings. The<br />
ratings showed a strong ceiling effect, with over 90% of the<br />
ratings judged in the highest category (*'yes,@@ indicating the<br />
student possessed the rated quality) on a three-point scale. The<br />
platoon average percentage of lo items given the llyesll rating was<br />
used as a measure of graduate quality. The inverse sine transformation<br />
was applied to the percentages before analysis.<br />
Statistical Analyses<br />
A quasi-experimental comparison of time trends between the<br />
baseline and SIMNET-trained groups was used to assess transfer<br />
effects from the added training. The date of graduation for each<br />
Class was the main independent variable in regression analyses.<br />
The effects of primary interest were changes in intercept and<br />
slope Of the trend over time from those shown by the baseline<br />
152<br />
-'
--. ..____ s<br />
platoons. Team Chiefs, coded as dummy control variables, were<br />
used to partial out differences in platoon averages associated<br />
with instructor teams. Other variables in the analysis of field<br />
exercise ratings were leader position and day of MTT. Effects of<br />
these variables were found to be independent of the time trends,<br />
and are not presented here. See Bessemer (In publication) for<br />
further details on the analyses and statistical results.<br />
Results<br />
In baseline AOB classes, the number of movement evaluations,<br />
but not contact evaluations declined for elementary exercises, as, _.<br />
Figures 1 and 2 show. The total elementary evaluations in Figure<br />
3 combine these categories. The baseline change reflects efforts<br />
made to conserve training resources. Contact evaluations were<br />
reduced further in classes with SIMNET AND HMMWV training. In<br />
contrast, for baseline classes in Figure 4, evaluations counted<br />
in advanced exercise showed no trend. These evaluations then<br />
increased in number after the added training began. Thus, SIMNET<br />
and/or HMMW training produced some immediate savings in the<br />
amount of elementary MTT training, which then was replaced by<br />
more advanced training exercises in the later AOB classes.<br />
The effect for field ratings shown in Figure 5 was like that<br />
for advanced evaluations. Average student ratings across classes<br />
gradually increased after the SIMNET and HMMW training was added<br />
to the AOB course, indicating positive transfer to performance in<br />
the student's initial MTT exercise emerged in later classes.<br />
For the graduate quality measure, the best-fitting trends<br />
shown in Figure 6 were not quite significant. Results for the<br />
first SIMNET-trained class are aberrant owing to a change in the<br />
wording of the rating scale in the next class. Omitting the<br />
first class after the baseline, a rank-sum test showed that<br />
graduate quality increased significantly in later classes.<br />
Discussion<br />
The tactical training added to the AOB Course was associated<br />
with three major effects. First, elementary contact exercises<br />
conducted in the MTT decreased in number, and were gradually<br />
replaced by additional advanced exercises involving defense and<br />
offense missions. Second, positive transfer in terms of improved<br />
field exercise performance in the MTT emerged gradually after the<br />
pre-MTT training was expanded by SIMNET training and HMMWV field<br />
exercises. Third, there were indications that the transfer<br />
effect persisted to enhance the judged quality of AOB graduates,<br />
at least for the last classes examined. Careful consideration of<br />
several possible confounding factors led to the conclusion (see<br />
Bessemer, In publication) that SIMNET training, rather than HMMWV<br />
training, was largely responsible for the observed transfer<br />
effects. The gradual emergence of these effects over an extended<br />
time was interpreted as reflecting the accumulation of instructor<br />
experience in using SIMNET to train platoon tactics.<br />
153
1 .<br />
0 "."',"<br />
0 lo 20 30 40<br />
.<br />
I � ��� �� �<br />
,<br />
A<br />
3<br />
SIMNIT TRAINING<br />
.<br />
� �<br />
WEEK FROM 1 JANUARY 1988<br />
�<br />
.<br />
* .<br />
��<br />
��� �<br />
� �<br />
Figure 1. Adjusted number of<br />
performance evaluations per<br />
platoon for AOB students in<br />
movement exercises during MTT.<br />
20 -<br />
r _<br />
0 I= ‘8 .<br />
2 16 12 14- - - -<br />
8 1D 3.<br />
6<br />
m B-<br />
$ b-<br />
+-<br />
2-<br />
BASELINE<br />
,::: i :<br />
.<br />
.<br />
TRAINING<br />
.<br />
.<br />
.<br />
SIMNET TRAINING<br />
:;<br />
0 IO 20 34 40 50 60 70<br />
WEEK FROM I .bNUARY 1988 WEEK FROM 1 JANUARY 1968<br />
Figure 3. Adjusted number of<br />
performance evaluations per<br />
platoon for AOB students in<br />
elementary MTT exercises.<br />
u<br />
u<br />
12<br />
BASELINE TRAINING SIMNfl- TRAINING<br />
01 8 I I I I I<br />
0 10 20 Jo 40 50 50 70 bD<br />
WEEK FROH 1 JANUARY 1988<br />
Figure 2. Adjusted number of<br />
performance evaluations per<br />
platoon for AOB students in<br />
contact exercises during MTT.<br />
4B<br />
44<br />
BASELINE TRAINING SIMNET TRAINING<br />
Figure 4. Adjusted number of<br />
performance evaluations per<br />
platoon for AOB students in<br />
advanced MTT exercises.<br />
This evidence for positive transfer helps the Army to show<br />
that its investment in networked simulation devices has value for<br />
officer school training. More importantly, these findings have<br />
significant general implications for how the Army conducts device<br />
training effectiveness tests, and how it uses devices. The value<br />
of training devices may be seriously underestimated in tests if<br />
trainers are not allowed sufficiently extended experience to<br />
learn how to train effectively using the device. Instructors in<br />
many tests have only been taught to operate the device, and have<br />
trained few soldiers on the device before training the test<br />
154<br />
.<br />
.
._<br />
50 - BASELINE TRAINING<br />
z .<br />
WEEK FROM 1 JANUARY 1988<br />
SlMNElT TRAINING I<br />
Figure 5. Adjusted mean<br />
performance rating by platoon<br />
for AOB students in their first<br />
exercise rated during MTT.<br />
Rating limits are +lOO.<br />
��� � �������� �������� SlMNEl<br />
TWINING<br />
01 fi 3 ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ *‘I ’ ’ ’ ’ ’<br />
-20 -lo 0 10 20 JO 40 M 60 70 00<br />
WEEK FROM 1 JANUU?Y 1988<br />
Figure 6. Adjusted mean arcsine<br />
percentage of items rated "yesI'<br />
on the Comprehensive Student<br />
Evaluation for AOB platoons.<br />
Angle limits are fn/2.<br />
sample. Quasi-experimental test designs can help overcome this<br />
problem, as well as limited statistical power imposed by small<br />
sample size. Many military training exercises are performed<br />
repeatedly by units in an annual training cycle. Collection of<br />
training records and appropriate performance measures can provide<br />
a large sample of baseline data to compare with results achieved<br />
with new training devices.<br />
The full benefits of training will not be obtained from<br />
fielded devices without consistently giving every trainer adequate<br />
experience to learn how to train most effectively. Turnover<br />
in unit trainers, and infrequent device use are factors that<br />
work against keeping instructor experience at a high level, and<br />
reduce the potential effectiveness of device training.<br />
References<br />
Bessemer, D. W. (In publication). Transfer of SIMNET trainins in<br />
the Armor Officer Basic Course (AR1 Technical Report). Alexandria,<br />
VA: U.S. Army Research Institute for the Behavioral<br />
and Social Sciences.<br />
Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation:<br />
Desian and analvsis issues for field settinos. Chicago: Rand-<br />
McNally.<br />
Gound, D., t Schwab, J. (1988, March). Concert evaluation<br />
proaram of Simulation Networkina (SIMNET) (Final Report,<br />
TRADOC TRMS No. 86-CEP-0345). Fort Knox, KY: U.S. Army and<br />
Engineer Board. (Available from Commander, U.S. Army Armor<br />
Center and Fort Knox, ATTN: ATZK-DS, Fort Knox, KY 40121-5180)<br />
155<br />
. .
CONTINGENCY TASK ‘IRAiNING SCENARIO GENERATOR<br />
1Lt Todd S. Dart<br />
2Lt Jody A. Guthals<br />
Ma? Timothy M. Bergquist<br />
Air Force Human Resources Laboratory<br />
INTRODUCTION<br />
The Contingency Task Training (CTT) project was directed at determining<br />
critical skills necessary in wartime or during mid to low-intensity conflicts.<br />
Subsequently , this knowledge would be used for training. The Air Force Human _.<br />
Resources Laboratory (AFHRL) was tasked with developing the methodology at the<br />
request of Headquarters Air Training Command (HQ ATC) and the U.S. Air Force<br />
Occupational Measurement Center (USAFOMC). The concept for CTT originated in a<br />
study performed by CSAFOMC in 1979, entitled the Air Base Ground Defense Tactics<br />
Analysis. A task survey for security police (SP) personnel was combined with a<br />
simple scenario in order to determine which tasks are more important during a<br />
given situation. The study was highly effective in restructuring the S? field,<br />
so much so that HQ ATC requested the technology be developed for combining task<br />
surveys w: th contingency scenarios. USAFOMC in turn produced Request For<br />
Personnel Research (RPR) 84-02, ‘Contingency Task Training Requirements.’ asking<br />
AFHRL to develop and validate the contingency technology.<br />
AFHRL started the CTT project in 1988. In order to develop, test, and<br />
validate scenarios for use with task surveys, the project was divided into two<br />
phases. Phase I was the development of the scenario generation technology.<br />
Phase II involved coupling scenarios to task surveys. The scenario and task<br />
survey would then be sent to senior noncommissioned officers (NCOs) w!lo would<br />
review the scenario and rate each task listed for their respective jobs as to<br />
training emphasis. The results would then be validated against Specialty<br />
Training Standards (STS) which list skills each airman is to be instructed in tc<br />
reach certain levels of proficiency. Some of the skills in the STS are marked<br />
with an asterisk signifying tasks to be taught during wartime. All other tasks<br />
not ma 1; 2. e 2 are to be dropped from instruction. The method of choosing xhich<br />
tasks to mark has always been left up to the senior NCO. In the past, marking<br />
wartime 5kills has been done at the last minute during course re-evaluation.<br />
Also, mart:+? .T~?‘!!s have never been validated.<br />
T::i: purpose of the CTT project was to provide a method to validate wartime<br />
skills. AFHRL undertook the task of creating scenario generation technology and<br />
subsequent validation via task surveys. AFHRL has now completed Phase I of the<br />
project (Dart, 19901 . The Phase II task survey will be performed by USAFOMC.<br />
CRITERIA<br />
HO ATC and USAFOMC were consulted to determine exactly what the scenario<br />
generator should comprise. Initial research indicated a need for a scenario<br />
generator able to generate both natural disaster scenarios and conflict/wartime<br />
scenar i 0s. Later, the focus changed to conflict/wartime scenarios only. A<br />
disaster scenario generator used in developing training for a disaster situation<br />
occurring infrequently in a small region would not be cost effective. The<br />
technology should concentrate on the mission of the Air Force, national defense,<br />
and the implementation of U.S. Armed Forces as part of national policy.<br />
156
The scenario must be short, concise and realistic. A poorly written<br />
credible scenario is better than a well written unbelievable one. According to<br />
experts in scenario generation, a scenario shouid provide the minimum amount of<br />
information to describe the situation (deLeon, 19731. People can only absorb a<br />
finite amount of data, and fine detail may distract the reader from the overal 1<br />
intent of the scenario. Only critical scenario variables had to be selected.<br />
The scenario descriptions are intended ior use with task surveys; hence, they<br />
must ‘paint’ a conflict situation with application to all Air Force Specialties<br />
(AFSsl .<br />
Consideration of the user group is also important. The CTT scenario<br />
generator is intended for use by people inexperienced with creating a scenario.<br />
Additionally, the primary user group, USAFOMC, is relatively small.<br />
To ensure necessary scenario generation guidelines are followed, and ..<br />
because of user inexperience, the scenario generator should be automated. The<br />
optimal design, with users in mind, would be a small program operable on DOS<br />
compatible microcomputers. The inclusion of an on-line help which would provide<br />
definitions to all conting;ncy variables was also deemed important.<br />
APPROACH<br />
Initial Research Existing scenario generators were investigated prior to<br />
any development. Typically, scenario generators are used in war games. They<br />
mainly deal with overall battle management as opposed to individuals,<br />
Therefore, the standard scenario generator used for combat tactics was of little<br />
to no use for CTT.<br />
A preliminary scenario design system was being developed by the U.S. Army<br />
for training intelligence gathering skills to intelligence officers. The Low<br />
Intensity Conflict Study Group of the U.S. Army Intelligence Center and School<br />
at Fort Huachuca, AZ developed a non automated scenario generator for creating<br />
low-intensity conflict (LIC) scenarios (Smiley, 19891. Since the material<br />
suited the needs of the CTT project, permission was obtained to use the<br />
variables in the CTT scenario generator.<br />
The Army’s material was appropriate for use in a LIC scenario. Future<br />
warfare is forcasted to be primarily in the LIC arena, but may also include<br />
‘normal ’ or high-intensity conflicts and mid-intensity conflicts such as<br />
Vietnam. The CTT scenario generator enhanced the Fort Huachuca version to<br />
include variables pertinent to all levels of combat. Also, certain definitions<br />
and variables were modified to apply directly to the Air Force and its mission.<br />
The different levels of conflict intensity provided structure for further<br />
Scenario generator development. Definitions of high, mid, and low-intensity<br />
conflicts were extracted from Army FM 100-20, and are listed in Table 1.<br />
Numerous other sources also provided input into the scenario generator<br />
design. Work done by AFHRL Logistics and Human Factors Division (LR) , called<br />
the Combat Maintenance Capability, provided information on collecting<br />
contingency skills information (Dunigan et al, 19851. They had developed a<br />
methodology to determine wartime maintenance tasks. Maintenance specialists<br />
we,re asked to indicate work unit codes (WUCl used to indicate repairs performed<br />
on aircraft. The scenario was set at Hahn AB, Germany during a Warsaw Pact<br />
offensive. Their study provided information helpful to contingency scenario<br />
design and task data collection. The Combat Maintenance Capability study<br />
evaluated several computer models, the most notable being the Logistics<br />
Composite Model (LCOMl, the Theater Simulation of Airbase Resources (TSAR), and<br />
the Theater Simulation of Airbase Resources inputs using AIDA (TSARINA).<br />
157
_1~,-.._I_ ---.. _. _ _.-.<br />
HIQH INTENSITY CONFLICT is war between two or more nations and their<br />
respective allies, if any, in which the belligerents employ the most modern<br />
technology and all resources in intelligence; mobility; firepower (including<br />
nuciear, chemical, and biological weapons 1 ; command, control, and<br />
communications: and service support.<br />
MID-INTENSITY CONFLICT is war between two or more nations and their<br />
respective allies, if any, in which belligerents employ the most modern<br />
technology and all resources in intelligence; mobility; firepower (excluding<br />
nuclear, chemical, and biological weapons1 ; command, control, and<br />
communications; and service support for limited objective under definitive<br />
policy limitations as to the extent of destructive power that can be<br />
employed or the extent of geographic area that might be involved.<br />
LOW INTENSITY CONFLICT is internal defense and development assistance<br />
operations involving actions by U.S. combat forces to establish, regain, or<br />
maintain control of specific land areas threatened by guerrilla warfare,<br />
revolution, subversion, or other tactics aimed at internal seizure of power,<br />
Table 1. CONFLICT DEFINITIONS<br />
Additional information on LIC scenarios came from the Army-Air Force Center<br />
For Low Intensity Conflict (CLIC) and the Joint Warfare Center.<br />
Work to determine medical wartime tasks is being done by the Medicai<br />
Wartime Hospital Integration Office (MWHIO) at Fort Detrick, MD. The project,<br />
titled WARMED, is designed to determine the critical wartime skills needed by<br />
medical personnel (Meinders, 19871. Concerns by V!ARM!D directors that the CTT<br />
project would overlap their own results and recommendations led AFHHL to avoid<br />
the medical field entirely in the scenario generator design.<br />
Other information sources included Air *Training Command Office of Wartime<br />
Plans (ATC/DPXl and the Headquarters Air Force Management Engineering Agency<br />
(HQ AFMEA) , Wartime Manpower Division. HQ AFMEA was concerned about common<br />
tasks, those critical for a wartime situation yet performed by all AFSs . For<br />
example, personnel in any specialty should know the tasks required for the<br />
donning of protective chemical gear. Most common tasks are survival skills in<br />
which everyone should be trained. The Air Force, while training some common<br />
tasks, does not have an active program of ensuring cannon tasks are learned and<br />
maintained by all personnel. The Army does have such a program and routinely<br />
tests all soldiers’ skills listed in a series of pamphlets appropriately<br />
entitled the Soldier’s Manual of Common Tasks (STP 21-l-SMCT, 1987). 7 h e<br />
concept of common tasks and peacetime tasks is best illustrated in Figure 1.<br />
Preliminary Design The result of the literature review and consultations<br />
was a manual scenario generator consisting of several categories with pertinent<br />
variables. A variable dictionary was also c,reated to aid in choosing the<br />
correct variables for a scenario. Variables not found in the Army’s LIC<br />
generator but included in the CTT scenario generator are factors describing the<br />
environment in more detail. Choice of variables was based on deleon’s w:rk<br />
which recommended appropriate material to include in any scenario.<br />
Once a written version of the scenario generator was developed, t 1: e<br />
necessity and later feasibility of an automated process became apparent. A<br />
computer program, designed to create the scenario, greatly enhances<br />
and consistency of scenario generation.<br />
the . sreed .<br />
158
I<br />
PEACiTIME<br />
TAIKI<br />
I<br />
,......_.,,....,., . . . . . ,..-- _ ,-(<br />
1<br />
CONTINOENCY<br />
TASK8<br />
LOW MID HIOH<br />
INTENBITY CONFLICT INTENSITY CONFLICT INTENIITY CONFLICT<br />
CONVENTIONAL OONVENTIONAL<br />
-<br />
t<br />
efOL0QlcAL<br />
CHEMICAL<br />
Figure 1. AFS Task Relationship8<br />
NUCLEAR<br />
BIOLOQICAL<br />
CHEMICAL<br />
Scenario Generator Pascal was selected as the programming language since it<br />
provided the necessary versatility and relative ease of use. The program was<br />
written with the use of Turbo Pascal version 4.0 and is very simple to use. T h e<br />
program operation speed increases when it is loaded onto a hard drive or RAM i<br />
drive. It will run on any IBM (DOS) compatible microcomputer with 512X RAM. A i<br />
color monitor is recommended but not required. The program consists of 2 files 1<br />
and is easily contained on a single 360X floppy disk. ;<br />
The main program, CTT.EXE, is a menu driven program. It presents all the<br />
variable categories developed for the manual version. In a step-by-step process<br />
beginning with the selection of intensity level, variables categories are<br />
presented with the specific variables listed for user selection. Table 2<br />
provides a listing and description of the variable categories in the. program.<br />
There are nine categories possible for High Intensity conflict and eight for rr,:d<br />
intensity. Low.intensity conflict has twelve categories, four more than for mid<br />
intensity, due to the complex nature of LICs.<br />
Variables are entered one at a time and stored in the computer until the<br />
last variable is selected, whereupon all variables are placed in a standard<br />
scenario format. In most cases, the variables are listed in a sentence format<br />
with no additional information. However, in a few of the variables additional<br />
information is drawn from a small ‘library’ within the program and displayed in<br />
the final scenario. This feature serves to enhance the quality of the scenario<br />
produced. Time constraints prevented displaying additional information for all<br />
variables, although such an improvement is recommended in future development.<br />
The second file, CTT-REV.HLP, contains variable definitions. This on-line<br />
‘help’ function is a useful feature of the program. When accessed, it provides<br />
a complete definition of all variables listed in the scenario generator.<br />
Supplementary information for many of the variables is also. This file must be<br />
loaded on the same disk with the scenario generator program to be accessed.<br />
The program will allow the user to generate low, medium and high-intensity<br />
conflicts. Special emphasis is given to low intensity conflicts as they are<br />
typically the most intricate. Interestingly, the low-intensity conflict is<br />
rapidly becoming the most common conflict the U.S. will face in coming years.<br />
159<br />
.<br />
,<br />
/
-Conflict Intenelty: choice of the degree of conflict intensity.<br />
-Highest NBC Threat Level (High Intensity Only): description of the<br />
amount of nuclear, biological, and chem:cal clothing worn.<br />
-Attrition: the amount of critical personnel and equipment<br />
damaged/wounded and destroyed/killed per month.<br />
-Logistics: the amount of critical supplies required to perform the<br />
mission which actually reach the combat area<br />
-Environment<br />
-- Area and Size: the size or area of operations.<br />
-- Sub-Terrain: choice of three areas of man-made environments.<br />
-- Terrain: choice of terrain combined with season presents a<br />
detailed climatic description.<br />
-- Season: choice of four seasons<br />
-Mission Duration: the length of time the scenario will last.<br />
-Command And Control: presents choices for particular commands or joint<br />
operation.<br />
-Low Intensity Conflict Only<br />
-- LIC Situations: eight of the most common types of situations.<br />
-- Operational Category: describes the general intent of the<br />
military operation undertaken to el‘tt;er combat or facilitate<br />
the LIC zitcation.<br />
-- Threat Type: type of threat US forces are expected to face<br />
during the LIC situation.<br />
-- Threat Support: the type of popu!ar support the threat will<br />
receive.<br />
Table 2. VARIABLE CATEGORIES<br />
The final scenarios produced by the program are short and simple as was<br />
specified by experts in scenario design. The user has options to print the<br />
scenario or send it to a computer disk file. If copied to a disk the scenario<br />
can then be modified using any word processor program that reads ASCII.<br />
The Contingency Scenario Generator User’s Manual (Dart & Guthals, 1990)<br />
provides additional information on the variables and the use of the program.<br />
VALIDATIOR<br />
Evaluation involved conducting what was termed a ‘reality check’. To<br />
perform the reality check, the program was taken to several wartime planning<br />
offices.<br />
The HQ ATC Technical Training (HQ ATC/TTIR?) division, HQ AFMEA. and the<br />
School of Aerospace Medicine, Battlefield Readiness (USAFSAM/EDO) office, Brooks<br />
AFB, were asked to review the program and provide input into its improvement.<br />
In addition to the above mentioned sources for scenario evaluation, other<br />
sources were contacted concerning specific aspects of the generator. ,Msa t<br />
notable was the value for attrition given in the scenario. The Air Force<br />
Wartime Manpower, Personnel and Readiness Team !AFW!ZRTl at Fort Riche, ,MD<br />
provided valuable information in this regard.<br />
The evaluation of the scenario generator by war planning experts led to<br />
several recommendations for further development. Those that were easy and<br />
straight-forward to implement in the time available were incorporated into the<br />
Scenario generator. Unfortunately, to implel;ent several recommendations wculd<br />
160
have involved complicated procedures or a major reprogramming of the generator,<br />
Therefore, although they would enhance the generator, those recommendations were<br />
not implemented.<br />
CONCLUSION<br />
During the CTT project a methodology was developed to design contingency<br />
scenarios. They can be used with task surveys to identify wartime tasks and<br />
subsequently, the needed training requirements. The project makes use of the<br />
latest information in scenario design and variable definition from both the Air<br />
Force and the Army.<br />
Phase I of the CTT project has been completed. The CTT scenario ’ generator-.<br />
has proven to be successful in its attempt to provide a suitable contingency<br />
scenario. In fact, while the program was originally designed for use with task<br />
surveys at USAFOMC, it has already been adopted by USAFSAM/EDO for designing<br />
scenarios for contingency*instruction of medical officers.<br />
Phase II of the CTT project, determining wartime skills through task<br />
surveys, would be undertaken and completed as appropriate USAFOMC.<br />
REFERENCES<br />
Army Field Manual 100-20 (19881. Low Intensity Conflict. Washington D.C.:<br />
Department of the Army.<br />
Dart, T. S., & Guthals, J. A. (19901. Contingency Scenario Generator User’s<br />
Manua 1 (AFHRL-TP-90-741. Brooks AFB, TX: Manpower and Personnel<br />
Division, Air Force Human Resources Laboratory.<br />
Dart, T. S. (19901. Contingency Task Training (AFHRL-SR-90-731. Brooks AFB,<br />
TX: Manpower and Personnel Division, Air Force Human Resources Laboratory.<br />
deLeon, P. (19731. Scenario Designs: & Overview (R-1218-ARPA). Santa<br />
Monica, CA: Rand Corporation.<br />
Dunigan, J. M., Dickey, G. E., Borst, M. B., Navin, D., Parham, D. P., Weiner,<br />
R. E. and Milier, T. M. (19651. Combat Maintenance Capability: Executive<br />
Summary (AFHRL-Z-85-351. Wright-Patterson AFB, OH: Logistics and Human<br />
Factors Division, Air Force Human Resources Laboratory.<br />
Meinders, M. (December 19871 . Talking Paper on Wartime Medical (WARMED) Work<br />
Center Description (WCDl. Fort Detrick, MD: Medical Wartime Hospital<br />
Integration Office (MWHIO).<br />
Smiley, A. A. (January 19891. Low Intensity Conflict Scenarios. Fort Huachuca,<br />
AZ: Low Intensity Conflict StudyGroup, U.S. Army Intelligence Center and<br />
School.<br />
Soldier Training Publication 21-I-SMCT (October 1987). Soldier’s Manual of<br />
- -<br />
Common Tasks. Washington, D.C.: Department of the Army.<br />
USAFOMC (April 19791. Air - Base - Ground - Defense Tactics Analysis (AFPT 90-E!‘- 1<br />
137, 90-812-1381. Randolph AFB, TX: Occupational Survey Branch, USAFOMC.<br />
161
COOPERATIVE LEARNING IN THE ARMY<br />
RESEARCH AND APPLICATION<br />
Angelo Mirabella<br />
U.S. Army Research Institute for the<br />
Behavioral and Social Sciences<br />
Cooperative learning (CL) is something many of us did in<br />
college when we took chemistry, physics, or calculus - courses<br />
built around problem solving exercises. We joined with a few other<br />
students to do homework or study for a test. We shared our<br />
understandings of the problems, helped each other correct<br />
misconceptions, and then reached consensus on how to solve' the -'<br />
problems. Today, formally organized cooperative learning groups,<br />
in classroom settings do the same things. Therefore, CL is not new<br />
or revolutionary. Yet, as a formal, institutionalized philosophy<br />
and methodology of instruction, it has been slow to take root,<br />
especially in the military. Public primary schools have progressed<br />
further. In Columbia, Maryland, for example, there is an elementary<br />
school whose classes are taught completely according to CL<br />
principles. The teacher primarily facilitates the work of small<br />
groups. The students and their activities rather than the teachers<br />
are the centers of attention.<br />
It is ironic that CL has taken so long to root since the more<br />
traditional teacher-centered approach, and even more modern<br />
individualized approaches are social arrangements contrary to what<br />
is demanded of people once they leave the school house (Raspberry,<br />
1987, 1988). This contradiction was especially blatant in my<br />
elementary school days when talking among students was punished<br />
with demerits, detention, or dreaded calls to parents to come to<br />
school for a conference. I remember vividly a visit by my father,<br />
who had to lose a day's pay to learn from my teacher that I talked<br />
too much in class. I also recall the back of his hand when I tried<br />
to explain my behavior and give my side of the story. Anyone who<br />
knows me would be astonished to learn that I once suffered from<br />
talkativeness. I often wonder what destructive social conditioning<br />
was imposed by such an environment.<br />
In contrast, cooperative learning implicitly recognizes that<br />
interpersonal relationship, i.e. communication, is one of the most<br />
pervasive and critical sets of skills anyone can learn. And what<br />
better place to foster such skills than the school house.<br />
Cooperative learning, is, at the same time, an effective way to<br />
develop other skills and thereby stretch instructional resources.<br />
At least that is the emerging conclusion of many years of basic<br />
research and some very preliminary applied research by the Army<br />
Research Institute. I'm hedging a bit because CL in the Army is in<br />
its infancy, and the work to be reported, while providing<br />
converging evidence for the effectiveness of CL, was not done in<br />
pristine, antiseptic laboratories.<br />
162<br />
I<br />
-
I'd like to start with an brief overview of basic research on<br />
CL and then review efforts by the Army to test the methodology in<br />
Training and Doctrine Command (TRADOC) schools.<br />
McNeese (1989) provides a useful tabulation of CL research.<br />
He cites Slavin's 1983 review showing that in 46 studies 29 had<br />
shown favorable effects, 15 no differences, and 2 showed advantages<br />
for fftraditionallf education. Johnson & Johnson (1985) reported that<br />
out of 26 studies 21 were favorable, two showed mixed results, and<br />
3 no differences: on balance, strong support for the value of CL.<br />
However, the reviews suggest that merely grouping students is not<br />
enough. Students do have to cooperate. Slavin goes further and says<br />
that group incentives, coupled with individual responsibility,'are -.<br />
essential.<br />
Otherwise, CL works,in many different circumstances. Johnson,<br />
Maruyama, Johnson, Nelson, and Skon (1981), in a review of 122<br />
studies extending from 1924 to 1981, found that CL was effective<br />
across a range of ages, subjects, and tasks. What emerges, from<br />
these reviews, is a type of conclusion often found for new<br />
performance technology. Cooperative Learning can be effective if<br />
properly designed. This conclusion applied to academic settings.<br />
Would it also apply to Army Schools?<br />
Research to answer this question was done at Fts. Lee and Knox<br />
and implemented at Ft. Lee under TRADOC-AR1 partnerships called<br />
Training Technology Field Activities (TTFAs). I first want to say<br />
a word about these, to provide a perspective on why this research<br />
was undertaken. In 1983 the TRADOC Commanding General concluded<br />
that his schools were not capitalizing on a steady stream of new<br />
ideas and technology emerging from the training R & D community.<br />
He wanted to establish a formal link from basic research to the<br />
Army's training community. Accordingly he invited AR1 to join with<br />
selected schools in TTFAs. Their purpose was to test new training<br />
technology, on significant Army problems, using TRADOC testbeds.<br />
The schools and TRADOC HQ were to lead in identifying test bed<br />
problems, while AR1 was to lead in identifying a prototype<br />
research-based solution. The partners were then to join forces in<br />
testing the solution.<br />
Activities were established at several schools including<br />
Quartermaster at Ft. Lee, Virginia and Armor at Ft. Knox, Kentucky.<br />
Cooperative learning projects were undertaken at these schools<br />
because basic research had shown that CL can be very efficient. But<br />
it had to be proven and implemented in Army settings. I'll mention<br />
the Knox work briefly and then focus on the work at Lee, since this<br />
was implemented and is still being used.<br />
Shlechter (1987) at Ft. Knox compared training effectiveness<br />
for cooperative groups of 2 or 4 students, and for individuals in<br />
the 19K MOS (Tank Commanders). From computer-based instruction (on<br />
MICROTICCIT) each student had to learn to interpret radio call<br />
signs and communicate in coded messages, tasks for which<br />
performance deficiencies had been documented.<br />
163
Improvements in performance were the same across training<br />
conditions, but the 4-student groups needed only two-thirds the<br />
time required by individuals to achieve comparable performance.<br />
Individuals and a-student groups were statistically the same here.<br />
Both the 4 and l-student groups made substantially fewer demands<br />
on instructor time as measured by ltcalls for proctor assistance",<br />
e.g 0 and 27 respectively compared to 115 calls from individuals.<br />
At the Ft. Lee TTFA, Hagman and Hayes (1986) examined the<br />
effectiveness of cooperative methods in a more traditional, noncomputer<br />
based setting. They wanted to define specific conditions<br />
under which CL would and would not work. From a review of the<br />
literature, they hypothesized that effectiveness of CL increases' -<br />
with increasing group size, though only when incentives were<br />
provided which encourage group members to share knowledge.<br />
Subjects were drawn from one unit (annex) of instruction in<br />
the 76C MOS Advanced Individual Training (AIT) course for supply<br />
clerks. Within this unit, the students receive a series of lectures<br />
each followed by a practical exercise (PE). Midway and at the end<br />
of the unit, the students are individually tested. Students who<br />
fail, go to study hall for remediation. Those who fail a retest are<br />
llrecycledVt, i.e required to repeat the annex.<br />
For the experiment, students were assigned to one of 3 groupsize<br />
conditions. They did the PEs alone, in groups of 2, or in<br />
groups of 4. The groups of 2 and 4 were further divided into two<br />
incentive conditions. Under one condition (Group Incentive), if any<br />
student in the group failed, every group member went to study hall.<br />
Under a second condition (Individual Incentive) only the failing<br />
student went to study hall. Hagman and Hayes predicted that under<br />
a group incentive (i.e. everyone to study hall), performance would<br />
increase as group size increased, but decrease with increasing<br />
group size under individual incentive.<br />
Results partially supported this prediction. For each of the<br />
two tests in the annex, groups of four were clearly optimal under<br />
the group incentive. But statistically groups of two did not out<br />
perform individuals. Recall that Shlechter had found a similar<br />
result at Ft. Knox. These similar results suggest, as a preliminary<br />
conclusion, that CL groups should contain more than two people.<br />
Other results supported the value of CL, but were inconsistent with<br />
the main hypothesis of the experiment. During the PEs, cooperative<br />
groups made fewer errors than did individuals, with or without the<br />
group incentive. In fact incentive made no difference at all.<br />
A potentially negative effect of CL was that groups took<br />
longer to complete PEs than did individuals. Not surprising since<br />
CL requires time for students to exchange information and ideas.<br />
However, if the added time does not exceed reasonable amounts of<br />
available instructional time or is off set by benefits, it can be<br />
discounted. Both conditions were satisfied in this study.<br />
164
~-~___-. -- ----<br />
Brooks et al (1987) did follow-up research, using the entire<br />
76C course as a test bed, to assess further the benefits of<br />
cooperative learning. This was actually a full-Scale implementation<br />
test with an additional measure: recycle rate. Recycle rate is the<br />
percentage (per course) of students who fail end-of-annex tests,<br />
attend study hall for remedial review of material, fail a second<br />
time, and then repeat the annex. All students in three cooperative<br />
classes worked in groups of four - 34 groups for a total of 136<br />
students. These were compared with students in three other,<br />
regularly scheduled and conducted classes with a combined<br />
enrollment of 128 students.<br />
Results. The bad news was that the Hagman and Hayes finding<br />
of improved test scores for groups of four compared to individuals,<br />
was not seen by Brooks et al. Aggravating bad news was the<br />
agreement with Hagman and Hayes that CL students took longer than<br />
individual students to complete PEs, though here again they<br />
finished within the allotted training time. The investigators<br />
checked to see if a treatment - aptitude interaction might be<br />
buried in the data. They divided subjects into high and low scorers<br />
on the ASVAB Clerical scale, but found no interaction with training<br />
method.'<br />
The good news was that CL students made fewer errors in the<br />
PEs (as in the Hagman study) and that recycle rate was reduced from<br />
10.9% to 4.4%, i.e. 60% lower for CL students than for individuals.<br />
Brooks et al extrapolated this saving to a year's worth of classes<br />
(about 3,000 students) and estimated a cost reduction of $136,000.<br />
Not a large sum in the bigger scheme of things, but if CL were<br />
implemented Army-wide, the savings could be significant. Moreover,<br />
achievement scores in CL classes were not worse than in the<br />
llconventionallt comparison classes. This would be especially<br />
constructive for CL in a computer-based classroom because it<br />
supports assigning one workstation to 3 or 4 students, thereby<br />
reducing the demands for expensive hardware. A potentially positive<br />
effect, demand on instructor time, was not assessed, but may have<br />
been present. Recall in the Shlechter studies, CL students required<br />
notably less instructor help than did individuals. Finally,<br />
students and instructors preferred CL to individual practice.<br />
Outcome of the Research. The work by Shlechter, Hagman and<br />
Hayes, and Brooks et al, as well as a solid foundation of prior<br />
basic research led AR1 to recommend that the Quartermaster School<br />
implement cooperative learning. Brooks (1987) wrote an instructor's<br />
manual on how to set up and manage a CL classroom. The methodology<br />
has since been used in AIT for 76C MOS. Moreover, if and when<br />
computer-based instruction becomes wide-spread in the Army, this<br />
same methodology could save millions of dollars. With multiple<br />
students per workstation, the number of required stations could be<br />
reduced by two-thirds to three quarters. And cooperative learning<br />
could very well revolutionize the way the Army trains.<br />
165
REFERENCES<br />
Brooks, J.E. (1987). An Instructor's Guide for Implementinq<br />
.Coonerative Learninq in The Equipment Records and Parts<br />
Snecialist Course ((AR1 Research Product 87-35).<br />
Alexandria, VA: US Army Research Institute for the<br />
. Behavioral and Social Sciences.<br />
Brooks, J.E., Cormier, S.M., Dressel, J.D., Glaser, M., Knerr,<br />
B.W., & Thoreson, R. (1987). Cooperative Learnina: A New<br />
Approach for Traininq Equipment Records and Parts<br />
Specialists (AR1 Technical Report 760). Alexandria, VA:<br />
US Army Research Institute for the Behavioral and Social<br />
Sciences.<br />
Hagman, J. D, & Hayes, J.F. (1986). Cooperative Learnina: Effects<br />
of Task, Reward, and Group Size on Individual Achievement<br />
(ARI Technical Report 704). Alexandria, VA: US Army<br />
Research Institute for the Behavioral and Social Sciences.<br />
Johnson, D.W., & Johnson, R.T. (1985). The Internal Dynamics of<br />
Cooperative Learning Groups. In R.E. Slavin, S Sharon, S.<br />
Kagan, R.H. Lazarowitz, C. Webb, & R. Schmuck (Eds), ~_ Learninq<br />
to Cooperate, Cooperatinq to Learn. New York: Plenum.<br />
Johnson, D.W., Maruyama, G., Johnson, R.T., Nelson, D., & Skon, L.,<br />
(1981). The Effects of Cooperative, Competitive, and Goal<br />
Structures on Achievement: a Meta-analysis. Psvcholoqical<br />
Bulletin, 89. 47-62.<br />
McNeese, M.D.(1989). Explorations in Cooperative Systems: Thinkinq<br />
Collectively to Learn, Learninq Individuallv to Think (AAMRL-<br />
TR-90-004). Wright-Patterson Air Force Base, Ohio: Armstrong<br />
Aerospace Medical Research Laboratory.<br />
Raspberry, W. (1987, September 29). Why Should Kids Compete in<br />
Class. Washinqton Post.<br />
Raspberry, W. (1988, August 1). From School to the Real World.<br />
Washinaton Post.<br />
Shlechter, T.M. (1987) . Grouped Versus Individualized Computer-<br />
Based Instruction (CBI) Traininq for Militarv Communications<br />
(AR1 Technical Report 1438). Alexandria, VA: US Army Research<br />
Institute for the Behavioral and Social Sciences.<br />
Slavin, R.E. (1983) . When Does Cooperative Learning Increase<br />
Achievement? Psvcholoqical Bulletin 94, 429-445.<br />
166<br />
---7<br />
.<br />
I
BATTLE-TASIUBATTLEBOARD TRAINING<br />
APPLICATION PARADIGM AND RESEARCH DESIGN<br />
John C Eggenberger PhD, Director Personnel Applied Research and Training Division,<br />
SNC Defence Products Limited<br />
Ronald L. Crawford PhD, Professor, Concordia University<br />
1. Introduction<br />
Competitive advantage occurs when one protagonist creates and exploits superior relative certainty in<br />
an area which is uncertain or problematic within the industry. Porter, Khandwalla, Waterman & Peters,<br />
and others have proposed typologies of ways in which one can gain competitive advantage (vie: product,<br />
promotion, investment, scope, etc.), but they give a false impression that these represent institutional,<br />
executive level, quantum events. The opposite, in fact, is the typical case. Quinn, Mintzberg;Crawford,<br />
Gram, and Star, Cyert, March, Cohen, Drucker and others emphasize that competitive advantage is more<br />
typically achieved cumulatively through successions of w ram to locally evident ambiguities,<br />
threats, opportunities and variations. In other words, through voluntary improvisations which<br />
people undertake on their own initiative in relation to disciplined actions.<br />
The improvisation process has been studied extensively by members of the SNC Personnel Applied<br />
Research (PAR)team in both military and civilian settings. It corresponds very clearly to the behavioural<br />
theory of the firm, consisting broadly of:<br />
� applying heuristic diagnostic and response skills;<br />
� using experimentation to test beliefs, learn more, and influence the constellation of factors; and,<br />
� creating uncertainty among one’s competitors.<br />
In typical populations, psychological readiness and capacity to exercise “disciplined initiative” thence<br />
improvise, are statistically uncommon. These populations are characteristically stabilized well before<br />
entry to the work force, and are developed over long periods of intensive investment of time, energy, and<br />
resources. There is substantial evidence, however, that comparable skills can be achieved by adults,<br />
although most current examples tend to be costly and harrowing experiences for the participants. Real<br />
or simulated equivalents of combat, for example, do produce high levels of intuitive problem solving and<br />
experimental learning, but with significant casualty rates and considerable cost.<br />
In this regard, the PAR team has identified the following:<br />
� a method and content which can be employed in broad based training & development<br />
settings to produce effective improvisations from the application of “Disciplined Initiative”<br />
� a reformulation of that content into a format which retains a high level of psychological<br />
engagement but reduces the resource requirements and real/psychological casualty rates;<br />
� a refinment of the content and method of instruction into a form suitable for field trial in a<br />
military setting; and<br />
� a development from the field trial parallel curricula tailored to the context of other industry<br />
applications and levels of management.<br />
2. Disciuline and Initiative:<br />
What are the major determinants or sources of initiative ? Discipline, on one hand, is acquired by<br />
learning how to deliver predictable and standardized outcomes when appropriately cued (certainty j.<br />
Improvisation, on the other hand, is the delivery of a satisfactory outcome when initiative is exercised,<br />
i.e., action is called for but the cues have not been experienced before, nor is there an available<br />
repertoire of rehearsed responses to cope with the situation (uncertainty). A far more complete<br />
Copyright SSC PAR DIV .&far 1990<br />
167
----- - .-_.-------_.. -_.-I-. .--.<br />
treatment of these notions in relation to prior research has been done and reported elsewhere.<br />
For the military commander, regardless of appointment, as well as for other vocations, the distinction<br />
between “discipline” and “initiative” is important. The commanders of sections, platoons, companies,<br />
battalions, divisions, corps, armies, who can handle both the determinate and indeterminate<br />
aspects of their responsibilities would appear to project a number of advantages as follows:<br />
� the commander would cope more effectively with both foreseen and unforeseen events,<br />
� the commander would require substantially less attention or supervision, and<br />
� the commander would have a greater capacity to assume authority.<br />
Within the context of the military commander some of the research questions we propose to ask in<br />
relation to Discipline, Initiative and the capacity to Improvise are as follows:<br />
� How is discipline developed? � How is initiative developed? . How do discipline and<br />
initiative interact? � How does “disciplined initiative” influence improvisation outcomes?<br />
3. The maior prouositions are listed as follows:<br />
� the more a person experiences intimate, emotional, idealistic and reinforcing socialization<br />
experience, the more a person will have the propensity to exercise “disciplined initiative” under<br />
conditions of uncertainty;<br />
� the more a person exercises “disciplined initiative” under conditions of uncertainty, the<br />
more a person will be able to exploit available options (improvise) in a battle situation;<br />
. the more opportunities the person has to rehearse battle scenarios under controlled conditions,<br />
the more the person ‘will exercise appropriate “disciplined initiative” decisions and actions;<br />
and,<br />
� the more a person acquires and copes with difficult assignments, the more a person will<br />
continue to exercise “disciplined initiative” under conditions of uncertainty.<br />
The matrix at Figure 1 shows the importance of DISCIPLINED INITIATIVE to the <strong>Military</strong> Commander.<br />
Clearly, it is important to design and deliver a curriculum of continuing training and education<br />
that will result in the bulk of the Commanders belonging to the upper left quadrant, and none<br />
found in the lower right quadrant.<br />
Copyright SSC PAR DIV Mar 1990<br />
HIGH<br />
HIGH 4-e LOW<br />
IMPLICATIONS OF DISCIPLINE AND INITIATIVE FROM THE<br />
PERSPECTIVE OF THE MILITARY COMMANDER<br />
168
4. Situational Awareness and the Militarv Commander<br />
Situational awareness has also been developed to deal with recent reanalyses of the sorts of thinking<br />
that goes on under complex and rapidly changing conditions, especially when information inputs<br />
and ouputs are degraded by blockages and noises of a variety of kinds and intensities. Essentially,<br />
the Commander must be able to act upon knowledge of himself and his forces, the disposition of the<br />
enemy forces, anticipate reaction of the enemy to his intiatives in the context rapidly changing conditions<br />
and timelines.<br />
The basis of the Commanders action is inputted information COMMUNICATED to him, mainly<br />
audio (voice) and visual (eye); outputted action information COMMUNICATED by him, mainly<br />
audio (voice) and psychomotor (eye-mind-finger-hand). What is of concern in the production of .<br />
qualified COMMANDERS is the types and ranges of thinking that must occur in order to decide on,<br />
and communicate, courses of action that are appropriate for given scenarios.<br />
5. Training to Tactical and Strategic Actions<br />
The effectiveness of combat elements depends to a great extent upon the ability of their personnel to<br />
carry out three kinds of actions:<br />
� Highly, efficient enactment of predictable routines, such as mobilizations, preparation for<br />
action, decamping, assembly, deployment into and out of movement formation, and establishing<br />
formations for classical types of actions. These are activities which recur with regularity in such<br />
consistent form that a well-drilled unit literally has them down to a well honed science. These are<br />
performed with minimal judgement because the “solution” is already known.<br />
� Applying “Directing Staff Solutions”, or “classical tactics” effectively in appropriate field<br />
and simulated situations. The clearest illustrations of these are the action sequences or drills in small<br />
unit tactical manuals. Those tell the participant what to do and how to do it under most tactical<br />
conditions. Directing Staff solutions require an element of active diagnosis of the context (i.e. a<br />
military appreciation), a choice among alternative responses from a standardized repertoire, and adaptation<br />
of those responses to match situational particulars.<br />
� Improvisation. Patton observed that plans never survive the initial engagement. Substantially<br />
the same sentiments of commanders and theoreticians across millenia demonstrate that under<br />
firing line conditions, the classical solution sometimes cannot be ascertained, may not apply because<br />
of locally evident threats or opportunites (or may even be counterproductive if it represents definitive<br />
intelligence for the opposing force).<br />
Under all battle conditions improvisations are required. Typical kinds of improvisation include:<br />
� making tentative and partial diagnoses under uncertainty;<br />
� using action to test diagnoses, clarify the context, and alter the context; and<br />
� creating uncertainty for the opposing.force.<br />
6. RelationshiD of, fPpG to ombat Related Doctrinal Training<br />
0<br />
Training to doctrine usually takes up the following three forms:<br />
Copyright SSC PAR DIV -Mar 1990<br />
169<br />
.
a) SCRIPTED ROUTINES, comprised of,<br />
. Rationale; Components; Chained components; Whole (Insight - Gestalt).<br />
b) ADAPTED ROUTINES, comprised of,<br />
� Pattern recoenition - more of the same situations.<br />
� Repertoire- more routines and variations.<br />
� ���<br />
c) IMPROVISATIONS, comprised of,<br />
� An Act -Watch stream.<br />
7. Scrinted Routines<br />
� A Convergent stream - � process of elimination � working backwards<br />
� partial solution � simplified modes � analogues<br />
� An Enacting stream � via networks � forcing errors � tactics of mistakes<br />
� A Comnetitive stream � using edge of certainty � creating uncertainty for others<br />
Scripted routines are the action equivalent of commodity strategies, or mass production. They<br />
depend for their effectiveness upon speed, precision, predictability and integration of more or less<br />
complex but fixed routines. Realtime thinking is largely replaced by decision loops and redundancy.<br />
The optimal training scenario for such manoeuvre is the rehearsal. In rehearsals, the “big picture”<br />
(e.g., a drill, movement, parade...) is broken down into its constituent components, such as tasks and<br />
actions. These are rehearsed until the trainee achieves complete command. The components are<br />
strung together in progressively longer trains of action until the entire routine is represented. Psychologically,<br />
the process is a direct application of behavioural conditioning (chaining).<br />
The challenge of teaching scripted routines is that the actions involved tend not to be very exciting<br />
or involving. This requires developing imaginative training methods such as competing against the<br />
clock, against scoring systems, or against other teams. Varying the training content will also help:<br />
here it is important to move from board simulations to field simulations at an early stage (e.g.,<br />
convoying in snowstorms, at night without lights...).<br />
8. Adanted Routines<br />
The fundamental training objectives here are to create repertoires of stored situational patterns, and<br />
to match these with “DS Solutions”, or repertoires of behavioural routines appropriate for each situation.<br />
Unlike scripted events, where a workable outcome is effectively guaranteed by rote enactment<br />
of a fixed recipe (e.g., a parade or unobstructed road movement), enactment of adapted routines<br />
requires making more or less continuous judgements and reassessments. Those are necessary, first,<br />
because tactical situations all differ in important detail, and because they change as actions evolve.<br />
These judgements are also necessary to modulate actions and to maintain unit control and external<br />
coordination. Under operational conditions there is precious little time or attention for anything else.<br />
That means that the “basics” of situational appreciaticn (information gathering, interpretation, summarizing<br />
in a working model) then identifying and implementing the appropriate tactical response<br />
must be almost reflexive.<br />
The classical approaches to situational assessment and theoretical doctrine actually work reasonably<br />
well. That is, breaking the processes into stages, and then working through numerous examples of<br />
each stage, starting with simple, small scale examples and working up to more difficult and complex<br />
Cqwight SSC PAR DIV Mar I990<br />
*-,. ,.,r .,..,_<br />
170
examples then chaining the stages successively into processes. The main problems with contemporary<br />
training are that it is not engaging or realistic enough, that trainers are usually insufficiently<br />
prepared or supported with aids, scenarios and materials, and that the trainers do not experience<br />
enough examples/cycles, and variations, for recognition and response to become second nature. The<br />
ultimate objective is to prepare someone who can act like he has seen the situation before, understands<br />
it, can visualize the opponent’s perspective, and can select and enact the actions needed to<br />
ensure favourable outcomes. The response of a BATTLE-TASK TRAINING/BATTL,EBOARD<br />
based curriculum is to create surrogate experience, and to meter that experience at a controlled but<br />
challenging rate of exposure.<br />
9. Imnrovisations<br />
The objective behaviour here is taking effective local action under uncertainty or ambiguity that<br />
obviates rational, calculated decisions. These conditions are commonplace, yet they are poorly<br />
. addressed (and in some quarters actively denied) in the training curricula. Improvisation accounts<br />
for the extent people are able to respond to, understand, exploit, and occasionally create transient,<br />
locally evident threats, opportunities, and ambiguities (versus becoming immobilized or simply<br />
plowing ahead according to the initial set of orders).<br />
Current knowledge of how people respond under uncertainty concentrates upon the interacting<br />
processes of:<br />
� heuristic problem solving,<br />
� learning by experimentation, and<br />
� creating opponent uncertainty & loss of initiative.<br />
To teach to these interacting processes the simulation scenarios and curriculum is modified. The<br />
scenarios follow a simpler progression than pattern recognition and/or repertoire development.<br />
These scenarios would directly link, simple - complex, small scale - large scale. However, they will<br />
be made deliberately ambiguous, with cues of increasing subtlety regarding threats, opportunities,<br />
dispositions and intentions. The objectives will be attaining tactical certainty and initiative (i.e.,<br />
bringing the simulation back to an accepted routine format). The role of the umpire/trainer will be<br />
made much more active, as he will effectively be reorienting the scenario to reflect what is learned<br />
from each move as well as the objective outcomes. The discussion emphasis will shift from recognition<br />
to inference. Physically the simulations will employ progressive disclosure. As capacity develops,<br />
options such as fractures in command - coordination can be included. The core issues are:<br />
� how can I think and experimentally work my way into a situation where I know what’s<br />
going on and can employ my tactical handbook, and<br />
� how can I prevent my opponent from getting to that stage first?<br />
10. TheTrials and the Setting.<br />
The initial focus of the trials is the Army Reserve Unit. Decentralization, resource constraints, limited<br />
personnel time, and the Army Reserve Unit’s need for an experience payoff which will enhance both unit<br />
and civilian career opportunities of participants, and the critical role of the Army Reserve Unit in a<br />
scenario of future national defence make the Army Reserve Unit a particularly attractive site for the trials.<br />
Further, favouring the choice of a Army Reserve focus is the availability of the training simulation<br />
Copyright SSC PAR DIV Mar 1990<br />
171
device BATTLEBOARD, a robust and transportable table-top terrain modeling simulator, and readily<br />
adapted doctrinally based Battle-Task training scenarios, as well as pre-existing knowledge of the<br />
general nature of tactical uncertainties of combat arms units, and the compressed time frames and clear<br />
field testing which Reserve settings offer.<br />
11. Interrupted Training Schedules<br />
Moreover, Army Reserve Unit training curriculum has not yet been specifically addressed in terms of<br />
the real constraints confronting a Army Reserve Unit. The Army Reserve Unit soldier trains “part time”,<br />
while the “Regular Force” soldier can train “fulltime”. Courses and exercises are not interrupted for the<br />
Regulars while they always are for the Reserve Unit. The usual method of fitting training requirements<br />
to Reserve Unit needs is to cut parts out of a curriculum, sequence the course curriculum differently, and/<br />
or stretch it out over a longer time period. Clearly these approaches will not be adequate for enlarged<br />
Reserve Units in a Total Force Army.<br />
Thus, throughout this project there is a concurrent activity devoted to applying the ingredients of a<br />
theory/model of linked learning. This theory/model is needed in order to accomodate the time<br />
available for training the Reserve Unit person. Usually this time is available in “dribs and drabs”.<br />
As a consequence, the curriculum must be parcelled out for the Reserve Unit in such a fashion that<br />
the results of training are the same as for the Regulars, who engage the curriculum as a coherent<br />
whole.<br />
The core assertion of this linked learning notion is that each BATTLE-TASK (e.g. “Advance to<br />
contact” - Infantry alone), is taught “wholistically”, in the teams that are in command, using terrain<br />
models, with the instructor using the “inductive” mode of instruction. The objective is to increase<br />
situational awareness in the team, and enable them to distinguish between “discipline” and “initiative”<br />
to increase the teams comprehension of, and the use of, “Disciplined Initiative”<br />
12. Action<br />
Figures 2,3,&4, overleaf portray the format of the trials that conform action science research strategy<br />
encouraged by Argyris et el(1985), and the Personnel Applied Research method used by the PAR team.<br />
Copyright SSC PAR DIV Mar 1990<br />
172
I<br />
L<br />
THE PERSONNEL APPLIED RESEARCH FORMAT<br />
COMBAT FORMATION = INFANTRY<br />
TEE MAh’OELVER li ADVAKCE ‘l-0 COSTACT<br />
TEE “DELIVERABLE” = A KILLTAKEN FROM AS ESEMY FORCE<br />
TACTICAL DOCI‘RIXE = (CEOOSE ONE)<br />
+<br />
choose one<br />
COMBAT FGRMATlONS TRAMPD<br />
USING THE BATTLEBOARD<br />
TRAl?SlSG SYSTEMS<br />
THE<br />
ACTION<br />
Copyright SSC PAR DIV Mar 1990<br />
THE<br />
RESLZT<br />
THE THE THE<br />
THE TRAm7b.G OFTlOS ACTION RESULT COMPARISON<br />
Chax.. OSE<br />
. SCRIFI-ED ROLmES<br />
. ADAkTED ROL?IThXS<br />
. MPROWBATIOSS<br />
� SCRIPTED ROLXXES<br />
� ADAPTBD ROUI-LSES<br />
. IMPROVISATIOSS<br />
. SCRWlED ROL-33<br />
. ADAPTED ROLTCIIES<br />
. IMPRO”ISA77OSS<br />
I I<br />
THE PERBOhrlYEL APF‘IED RESEARCH - BASIC DRSIGN<br />
THE COMARITWE ANALIS=<br />
177<br />
THE<br />
COMPARISON<br />
COw*AT FoRx4HATIONS TRANED<br />
USING CURREWT<br />
TRAIMNG SYSTEMS<br />
THE<br />
ACTIOX<br />
THE<br />
RESULT
_-- ..--. _-. ._ _--... ----<br />
__ _ . _ _<br />
COMBAT VEHICLE COMMANDER'S SITUATIONAL AWARENESS:<br />
ASSESSMENT TECHNIQUES<br />
Carl W. Lickteig<br />
Major Milton E. Koger<br />
U.S. Army Research Institute<br />
Field Unit-Fort Knox<br />
Captain Thomas F. Heslin<br />
2nd Squadron, 12th Calvary Regiment<br />
Fort Knox, Kentucky<br />
Abstract<br />
The ability to "see the battlefield" is critical to<br />
successful execution of the battle. This precept is true at all<br />
echelons including commanders of small units and individual<br />
weapon systems. To train and foster this ability, however,<br />
methods for assessing and enhancing the commander's situational<br />
awareness (SA) are required. Recent efforts (Endsley, 1988;<br />
Fracker, 1988) have focused on the development of objective<br />
measures of fighter pilots SA. This paper extends this effort to<br />
measures of SA for land combat vehicle commanders.<br />
As part of the Army Research Institute's (ARI) program of<br />
research in support of future Combat Vehicle Command and Control<br />
(CVCC) systems, small unit commander's SA was identified as a<br />
potentially important measure of system effectiveness. Parallel<br />
forms of two SA instruments were developed for objective<br />
assessment of a commander's perception, comprehension, and<br />
projection of the battlefield situation. This paper provides a<br />
description of these SA instruments and their utilization in<br />
support of the CVCC simulation-based program.<br />
Background<br />
The combatant's SA represents his knowledge of the world and<br />
his role in it. SA includes both lower and higher order mental<br />
processes ranging from the simple perception of individual<br />
elements of the situation to an assessment of their meaning and<br />
impact on immediate and overall mission objectives. Endsley's<br />
model of SA details three distinct levels--perception;<br />
comprehension, and projection--included in the following<br />
definition of SA: ". ..the perception of the elements in the<br />
environment within a volume of space and time, the comprehension<br />
of their meaning, and the projection of their status in the near<br />
future" (Endsley, 1988, p. 97).<br />
For ground forces, SA is more commonly described as the<br />
commander's ability to "see" the battlefield in relation to his<br />
mission and the overall mission. Combined arms combat,<br />
particularly for ground systems, entails coordination and support<br />
of multiple units. Situational awareness for combined arms<br />
commanders must include, perhaps more so than for combat pilots,<br />
the context of the combined mission.<br />
174
Typically a commander's awareness of a combat situation<br />
begins with the assignment of his unit's mission embedded in the<br />
concept or schema of the overall mission that his unit is<br />
supporting. The mission specifies the area of operations on the<br />
battlefield, the place(s) in the world that the commander is to<br />
occuPYl as well as the objectives and time frame driving mission<br />
pace. The mission brief and order of operations describe the<br />
known and suspected enemy forces and activities in that area, key<br />
terrain features and locations related to mission accomplishment,<br />
and friendly combat, support, and service support units<br />
responsible for mission execution.<br />
Once the battle commences, the commander's perception<br />
(Endsleyls SA Level 1) of the situation is enhanced by the direct<br />
or reported detection of enemy units. When initial contact and<br />
spot reports are received by the commander, his perception of the<br />
situation must be quickly updated. As a commander, he must also<br />
attempt to comprehend (SA Level 2) this information, its<br />
significance to his unit and mission. Given the reported type<br />
and number of enemy units detected, he may begin to estimate the<br />
size and type of the overall force committed, their weapon<br />
systems and range, their organization and support.<br />
As his understanding of the situation develops, the commander<br />
begins to project (SA Level 3) or reassess probable courses of<br />
action. Given the location and heading of units reported and his<br />
estimate of force structure, he may begin to calculate when, or<br />
if, the main unit will reach his location, at what point he may<br />
need to displace his unit from their current location, and what<br />
impact the current situation will have on the future situation<br />
such as his unit's next proposed location.<br />
This effort in SA development for ground systems is part of<br />
ARI's program of research in support of future CVCC systems.<br />
These systems will provide ground vehicle commanders a unique<br />
capability for the digital communication of text and graphic<br />
battlefield information, in addition to conventional FM radio.<br />
The CVCC program objective is the development of soldier-tested<br />
specifications for future automated command and control systems<br />
for ground combat vehicles. AR1 conducts simulation-based tests<br />
o.f prototype CVCC systems using the Armor Center's Close Combat<br />
Test Bed (CCTB), formerly Simulation Networking Developmental<br />
(SIMNET-D), at Fort Knox.<br />
Simulation-Based Methodology<br />
An objective measure of commander's SA is based on a<br />
comparison of the actual situation with the commander's<br />
assessment or report of the situation. Maintaining an accurate<br />
knowledge of the battlefield situation, however, is difficult for<br />
both commanders and SA researchers. For the latter, simulationbased<br />
scenarios provide a capability to control and know the<br />
battlefield situation.<br />
175<br />
. .
To ensure an accurate knowledge of the actual situation at<br />
the time of SA assessment, a set of battlefield situations,<br />
vignettes, were developed in which all the informational elements<br />
pertaining to the situation were prespecified and prerecorded.<br />
Prespecification ensured standardization of situation<br />
determinants.<br />
Prerecorded materials for the vignettes included simulationbased<br />
files designating commander and friendly unit locations,<br />
operational overlays to be displayed on the commander's Command<br />
and Control Display (CCD), and message sets to be received on his<br />
CCD during the vignette which would provide updates on his<br />
battlefield situation.<br />
At the start of each vignette, the<br />
commander was provided a map and map board with acetate<br />
operational and note overlays, and a brief description of the<br />
battlefield situation leading up to the vignette.<br />
The tactical situation for the vignette placed the commander<br />
in his tank simulator occupying a stationary defensive battle<br />
position (BP) for a delay-in-sector mission. The time frame for<br />
the vignette began after the postulated successful delay of<br />
initial enemy elements by his unit. The vignette was terminated<br />
prior to his unit's displacement to a subsequent BP.<br />
Immediately after a lo-minute message reception and<br />
processing phase, the commander was escorted out of his simulator<br />
to an adjacent workstation. He retained the map and map board<br />
used while receiving messages, but the operational and note<br />
overlays were replaced with another acetate sheet depicting only<br />
his own BP and the BPS of the adjacent companies. The commander<br />
was given one version of both the plotting and 'Iseeing"<br />
questionnaires, described in the following section, and lominutes<br />
to record his answers. For each vignette, one<br />
questionnaire pertained to the current situation and the other to<br />
the future situation in a counterbalanced sequence across the<br />
series of vignettes.<br />
SA Instruments<br />
The primary goals in the development of the situational<br />
measures for this effort were (a) to develop a set of items that<br />
addressed each of the primary levels of SA for small unit ground<br />
commanders, and (b) to develop a response format that supported<br />
objective scoring of a commander's SA responses.<br />
Perception: Plotting<br />
To capture commander's perception of the situation, a<br />
situational awareness form was developed which required<br />
commanders to plot on a military map the locations of reported<br />
enemy units, friendly units, and key control measures. The<br />
location data selected for these items were based on SME's<br />
176<br />
.
Table 1<br />
Situational Awareness Items: plotting the Battlefield Situation<br />
Current Situation Future Situation<br />
Largest unit engaged<br />
Largest unit approaching<br />
Friendly scout unit<br />
Target reference points<br />
Largest unit outside sector<br />
Support unit to rear<br />
Company's subsequent BP<br />
Obstacle(s) to rear<br />
Enemy scouts to rear . . .<br />
Mortar unit to rear<br />
estimates of the more important location information provided<br />
during the vignette. A five-item series of plotting questions<br />
was developed for both the commander's current situation and<br />
future situation as indicated in Table 1.<br />
The current situation was defined by informational elements<br />
of more immediate concern to the commander including enemy<br />
elements currently being engaged by his unit. The future<br />
situation was defined by less immediate information including<br />
enemy units in the area but well beyond current range, or<br />
information related to his next location, the subsequent BP.<br />
Comorehension and Projection: 18Seeino"<br />
To assess the commander's comprehension and projection of the<br />
battlefield situation, a second SA form was developed. Items on<br />
this form required commanders to compile isolated report<br />
information into aggregate reports, to estimate the size of<br />
designated enemy units including main and attacking units, and to<br />
project the impact of the information received on his unit's<br />
current and future situations. Five close-ended items were<br />
developed for both the current and the future situation, Table 2.<br />
For the current situation the items addressed the commander's<br />
ability to comprehend the more immediate battlefield situation to<br />
the front of his current BP. The first two items required him to<br />
compile reported information received during the vignette into<br />
summary reports detailing the number and type of enemy units<br />
destroyed and damaged by his company, and the number and type of<br />
enemy units still approaching his current BP. The remaining<br />
items addressed the commanderls ability to go beyond the data<br />
actually reported, to understand the nature of the threat facing<br />
both his company unit and the overall task force. These items<br />
asked the commander to estimate in turn the size and type of the<br />
enemy unit actually engaged, the unit approaching his company,<br />
and the total unit committed against the overall task force.<br />
177
Table 2<br />
_-_-.-__. .-_.. -. .-.-<br />
Situational Awareness Items: "Seeing*' the Battlefield Situation<br />
Current Situation Future Situation<br />
Number 61 type enemy damaged Distance/direction to main Unit<br />
Size t type unit engaged Heading of main enemy unit<br />
Number & type unit approaching ETA main unit < 2,000 meters<br />
Size & type force approaching Distance/direction next BP .<br />
Overall size & type unit Impact of obstacle(s) on unit's<br />
confronting the task force next BP<br />
~For the future situation, the items addressed the commander's<br />
ability to project beyond his immediate situation and use the<br />
information provided during the vignette to anticipate upcoming<br />
events. The initial items focused on the commander's awareness<br />
of the main enemy unit approaching his company sector. Reports<br />
received during the vignette had provided information about the<br />
heading and location of a relatively large enemy unit in the<br />
company's sector but well beyond engagement range. The commander<br />
was required to provide the location and heading of this main<br />
unit, and then estimate if, and when, that unit would approach<br />
within 2,000 meters of his current location.<br />
The final two items assessed the commander's awareness of key<br />
information related to his unit's proposed future location. One<br />
item asked him to provide estimates of distance and direction to<br />
his unit's subsequent BP, and the final item asked him to assess<br />
the impact that reported obstacle(s) might have on movement to,<br />
and occupation of, that BP.<br />
Obiective Responses: Scoring<br />
A key concern in the construction of the SA items was to<br />
develop a set of questions that clearly specified the situational<br />
information requested. The simulation-based vignettes driving<br />
the scenarios were designed by subject matter experts (SMEs) to<br />
' provide a wide range of battlefield reports of differing<br />
relevance to the commander's mission. To ensure commanders<br />
clearly understood what information was being requested for each<br />
item, special attention was given to item wording. The item<br />
stems consistently provided and emphasized, for example,<br />
distinctions between enemy units engaged versus not engaged,<br />
locations in the unit's sector versus adjacent sectors, and<br />
elements to the front versus the rear of the unit's BP location.<br />
To meet the goal for SA instruments that could be objectively<br />
scored, the response formats required commanders to provide<br />
178
answers that precisely indicated their knowledge of the<br />
information requested. For items in which commander's were<br />
required to plot the locations of designated elements, objective<br />
assessment of location accuracy was straightforward. For the<br />
remaining items directed at comprehension and projection of the<br />
situation, a combination of fill-in-the-blank (e.g., enemy type,<br />
number) and multiple choice (e.g., mechanized rifle battalion<br />
versus tank company) item formats were used. SMEs assisted in<br />
the construction of all response options to provide commanders<br />
appropriate and meaningful response alternatives.<br />
Two pilot sessions with active duty Armor commanders, three<br />
platoon leaders and one company commander per pilot, were<br />
conducted to obtain user feedback on the SA procedures and items<br />
developed. During the first pilot, commanders provided detailed<br />
feedback during structured debriefs. Their comments assisted,<br />
particularly, in the identification of items requiring more clear<br />
or explicit wording. Their recommendations were included in<br />
revisions to the SA measures, and the revised questionnaires used<br />
for the second pilot appeared quite adequate.<br />
SA Utilization<br />
The SA forms are currently being used in ARI's CVCC program<br />
of research to investigate small unit commander's information<br />
requirements. An initial evaluation compared commander's SA as a<br />
function of message sets received on their CCD that differed in<br />
volume, number of messages per set, and relevance to their<br />
battlefield situation. Results of this effort are expected to<br />
provide recommendations for improving the design of this future<br />
automated command and control system. In addition, this data<br />
will be used for empirical validation of the SA method and<br />
instruments described.<br />
A follow-on baseline evaluation of commanders using only<br />
conventional FM radio systems without a CCD will provide<br />
comparison data on the speed and accuracy of the CCD for<br />
receiving and relaying battlefield communications. As an<br />
additional dependent measure, the SA instruments will provide<br />
comparison data on the CCD's ability to help the commander<br />
integrate command and control information into a more accurate<br />
awareness of his battlefield situation.<br />
References<br />
Endsley, Mica R. (1988). Situation awareness in aircraft systems.<br />
In Proceedinas of the Human Factors Society-32nd Annual<br />
Meetinq, I, 96-101. Santa Monica, CA: The Human Factors<br />
Society.<br />
Fracker, Martin L. (1988). A theory of situation assessment:<br />
Implications for measuring situation awareness. In Proceedings<br />
of the Human Factors Society-32nd Annual Meetinq, 1, 102-115.<br />
Santa Monica, CA: The Human Factors Society.<br />
179<br />
. .
An Aviation Psychological System for Helicopter<br />
Pilot Selection and Training<br />
F.Fehler<br />
Consulting Psychologist, German Army Aviation School, Bi.ickebu1.g<br />
1. Current Situation in Aviation Psychology<br />
111 Germany, aviation psychology looks back OIL an impressirig<br />
history which had its beginnings way back in 1916 as some<br />
mythical accounts would have it. Although it is untrue that the<br />
late "Red Baron" made the acquaintance of aviation<br />
psychologists, it is certainly true to say that all German<br />
military pilots since the end of WW I have been confronted with<br />
aviation psychology in one way or another, if not with an<br />
actual aviation psychologist, then at least with aviation<br />
psychology methods and instruments. AS a general rule, such<br />
instruments would include paper and pencil tests, and boxes<br />
with all kinds of levers, buttons, lights and bells. III the<br />
sphere of aviation, psychology was essentially synonymous wit.11<br />
pilot candidate selection. Presumably this is also true fol<br />
other countries where aviation psychology is practiced.<br />
On the aLher Iland, aviation psychologists were surprisingly<br />
hesitant in touching two other important areas of aviation,<br />
namely<br />
- pilot training<br />
- psychological support for aviators.<br />
Obviously, this is a short-sighted Lttitude, for it is tile<br />
training that will show whether or not the previous<br />
psychological screening methods were succesful. Psychologists<br />
should therefore attend flight training, either by making<br />
active contributions or by merely acting as obeservers, to<br />
ensure that the criteria applied to conducting the trainir1.g and<br />
to assessing the achievements made are the same as those that<br />
were applied to devising and evaluating their own test me,thods.<br />
Any other approach would not lead .to representative validation<br />
coefficients.<br />
An aviation psycholgist who descends from the heights of his<br />
ivory Lower research to offer his knowledge to an aviation<br />
school and commit himself to solving its practical, everyday<br />
problems will soon find himself left alone and discover thtit he<br />
does not have the psychological tools required. Wtlat i.s Lhc:<br />
reason for- this and what call be Doyle about it'?<br />
180
2. Identifying The Problem<br />
2.1. Screening Methods Used Ry Aviation .Psychologists<br />
The need to screen pilot candidates is undisputed; screening does<br />
not only serve the purpose of making the training cbst-effective,<br />
it also intends to save unsuitable applicants from having to abort<br />
a career. The problem of choosing suitable applicants seems to be _<br />
an easy task for the practitioner in psychology, as he can choose<br />
freely from a plenitude of psychological methods that have been<br />
accumulated by two generations of psychologists having done<br />
exstensive research in this particular area. On looking more<br />
closely at the existing litera,ture, however, he will discover the<br />
following: the best achievements made so far are validation<br />
coefficients that lie with r=. 5 in the most favorable cases!<br />
For the selection of applicants this means that he has to apply<br />
the most uncompromising cut-off-scores if he wants to satisfy the<br />
management with less than 10 % of candidates who have to be washed<br />
out from pilot training. It is obvious that such an approach would<br />
be synonymous with a sharp increase in the percentage of<br />
mistakenly rejected candidates, which is totally unacceptable<br />
unless one can draw on a large number of applicants. The latter is<br />
not the case in German army aviation. This means that the<br />
conventional testing methods are exhausted. Now that the old test<br />
methods have been modified and renamed over decades it is hard to<br />
imagine the occurance of a major breakthrough to yield validation<br />
coefficients that are clearly above r=.5 .<br />
2.2. Pilot Training<br />
The contribution aviation psychology has made to pilot training is<br />
in fact very small when one looks at the contributions made in the<br />
field of pilot candidate selection. There is not much point in<br />
taking enormous pains to choose student pilots and then abandon<br />
them to their fates. In the German Army Aviation Branch, the<br />
psychologist will look after every student pilot who gets into<br />
trouble in the course of flight training. This approach has hel.ped<br />
to gain the following experiences:<br />
Problems resulted from<br />
- flight instructors being unable to establish a personal<br />
relationship with the student<br />
- vague learning objectives which were not clearly understood by<br />
students and interpreted differently among instructors<br />
- inconsistent teaching methods<br />
- the structure of the training program being based on a too<br />
demanding learning progression and paying no attentiorl to<br />
student's individual training needs and learning speeds.<br />
181
In his attempt to overcome these difficulties, the aviation<br />
psychologist learns that he lacks an important tool: to develop<br />
and to test training concepts he must have free access to a full<br />
mission simulator and to helicopters. Availability and flexible<br />
use of both these types of systems under varying experimental<br />
conditions are extremely limited due to severe operational<br />
restrictions.<br />
2.3. Further Tasks Of Aviation Psychology<br />
The final success of the screening methods used in aviation<br />
psychology also depends on the following aspects:<br />
t training provided to instructors<br />
t prophylactic stress prevention anii coping programs<br />
+ analysis of pilot behavior displayed in the cockpit.<br />
From all these tasks one can conclude that the aviation<br />
psychologist needs a simulator-like system the primary asset of<br />
which is the capability to simulate psychological demands rather<br />
than maximum realism in terms of aircraft control .<br />
3. Solution<br />
The solution to the problems arising from pilot candidate<br />
selection, flight -training, ergonomics and psychological support<br />
must be looked for in the cockpit, However, the cockpits of full<br />
mission simulators or actual aircraft are not suited for the<br />
purposes of a scientific analysis for various reasons. This means<br />
that an aviation psychologist has to develop his own cockpit<br />
optimized for his specific aims. Such a concept will be described<br />
in the following paragraphs: Tile procurement of this system called<br />
"Aviation Psychological System/Helicopter (APS/H)" for the Germati<br />
Army Aviation Branch has already beer1 initiated.<br />
3.1. Task Structures<br />
The task structures of the APS/H are based or1 psychic problems<br />
that typically arise in the course of actual training missions:<br />
3.1.1. Problem Area: Psychomotor Control<br />
The intricacies of controlling an aircraft may lead to psychomotor<br />
problems . The APS/H will therefore be equipped with all the<br />
controls that can be found in a helicopter. Control inputs made<br />
with the APS/H controls will not only convey the same fi:~;iing iis<br />
those of a real helicopter, it \qill also be possible to c!:ange Lli~<br />
control st?nsitivity over a wide j*ringe.<br />
182
3.1.2. Problem Area: Handling Complexity<br />
Sophisticated helicopter cockpits give rise to individual handling<br />
difficulties. It will therefore be possible to analyze these<br />
difficulties by switching displays and control elements on and off<br />
and thereby vary handling complexity.<br />
3.1.3. Problem Area: Mission Demands .<br />
Certain flight missions and the demands that go with them may<br />
overtax the pilot mentally or physically. The APS/H therefore will<br />
have the capability to simulate all individual tasks and<br />
requirements pilots ha$e to fulfil during typical missions.<br />
3.1.4. Clinical Aviation Psychology i<br />
The special mission requirements inherent of flying military<br />
helicopters may also push experienced pilots to their performance<br />
limits. The APS/H is therefore designed such that flying-related<br />
requirements and psychotherapeutical measures (e.g.<br />
desensitisation, autogenic training etc.) can be combined with one<br />
another.<br />
3.1.5, Aviation Psychological Research I t,:<br />
It is a matter of course that designers of sophisticated equipment<br />
develop special test set-ups to be applied throughout the test<br />
phase in order to analyze and evaluate the operating performance<br />
of the device under development. APS/H will be a similar tool<br />
which will not only be used for solving ergonomic problems but<br />
also for<br />
- developing training systems and methods, and for<br />
- testing crew concepts.<br />
3.2. Realism Requirements For The APS/H<br />
Simulator realism is not an end in itself. An improvement in<br />
realism does not automatically improve simulator efficiency. An<br />
increase in realism will primarily go hand in hand with a linear<br />
increase in cost. So what is the degree of realism required for<br />
the APS/H to ensure maximum efficiency?<br />
3.2.1. lYotor Realism<br />
Operation of the controls, latency between control input and<br />
instrument display need to be as close to reality as possible<br />
since automatic handling patterns internalized by the pilots would<br />
make it difficult to overcome the effects of negative transfer.<br />
183<br />
!
3.2.2. Motion Realism<br />
Motion is essentially perceived by the visual organs. APS/H can<br />
therefore do without a motion system. Nevertheless, vibrations<br />
typical of helicopter flying will be generated by Pitting a<br />
vibration device to the pilot seat.<br />
3.2.3. Visual System Realism<br />
The APS/H needs a visual system for the following purposes:<br />
+ flight attitude-related visual feedback<br />
+ visual cueing for landing approaches<br />
+ projection of obstacle images for nap-of-the-earth flying<br />
t projection of images of prominent terrain features for visual<br />
navigation.<br />
All these images will be schematic in nature. It should be clear<br />
that no gain will be made in dealing with the above-mentioned<br />
tasks.by adding the image of leaves to the trees simulated.<br />
3.2.4. Acoustic Realism<br />
Realism in motion cueing will be enhanced by a realistic<br />
simulation of environmental sound patterns. This creates the need<br />
for an acoustic system with duemy head microphone quality via a<br />
head set.<br />
All in all, a detailed analyzis shows that the level of realism<br />
required for the APS/H need not be extraordinarily high to serve<br />
its purpose. Especially in the field of visual systems design,<br />
schematic images will do and thereby reduce overall costs.<br />
4. Summary<br />
Conventional test methodes (paper/pencil etc.) are firmly<br />
established tools to be applied in all phases of pilot candidate<br />
selection processes, but it should be borne in mind that their<br />
validity is limited and cannot be improved considerably as can be<br />
seen from the experiences gained, An in-depth analysis of the<br />
psychic potential required for meeting flying demands presupposes<br />
the esistencc of methods that are in keeping with real flyilhg<br />
demands and, additionally, permit the application of scientificexperimental<br />
criteria. When one loolcs at physicists who, in,sekrch<br />
of minute particles, venture to demand equipment of inconceivable<br />
dimensions and are actually provided with it, then it is justified<br />
to say that the outlined APS/H designed to study behavioral<br />
Patterns of helicopter pilots is fairly modest demand.<br />
184
Analyzing User Interactions<br />
With Instructional Design Software<br />
J. Michael Spector<br />
Daniel J. Muraida<br />
Air Force Human Resources Laboratory<br />
Brooks AFB, TX 78235-5601<br />
Abstract<br />
Many researchers are attempting to develop<br />
automated instructional design systems to guide subject<br />
matter experts through the courseware authoring<br />
process. What appears to be lacking in a number of<br />
existing research and development efforts, however, is<br />
a systematic method for analyzing the interplay between<br />
user characteristics, the authoring tool's structure<br />
and organization, and the resulting quality of<br />
computer-based instruction (CBI). This paper describes<br />
the initial application of a particular approach that<br />
focuses on the analysis of inputs, processes, and<br />
outputs that occur in human-computer interactions (HCI)<br />
between end users and a prototype of a CBI design tool.<br />
Instructional Systems Design (ISD) is an established process<br />
for designing and developing instructional materials. ISD models<br />
were first elaborated in the 1950's using a behavioral learning<br />
paradigm and have since undergone many revisions and refinements<br />
(Andrews & Goodson, 1980). Traditionally, ISD has,been viewed as<br />
the practical application of knowledge about learning and tasks<br />
to be learned to the design of instruction (Gagne, 1985).<br />
Many Researchers have pointed out the need to provide an<br />
update of ISD based on the findings of cognitive science<br />
(Tennyson, 1989). What is also needed is an update of ISD that<br />
takes into account computer-based interactive methods for<br />
presenting instruction (Muraida, Spector, & Dallman, 1990).<br />
Using computers to design, develop, and deliver instruction<br />
complicates ISD considerations. Some instructional strategies<br />
appropriate for certain classroom-based settings are not<br />
appropriate for certain computer-based settings. For example,<br />
some common classroom strategies involve the teacher making<br />
provocative statements and asking leading questions. Likewise,<br />
it is possible to construct alternate computer models of various<br />
devices and simulate their performance; this is not easily<br />
possible in a classroom. As a result, instructional strategy<br />
differences exist between classroom and computer settings.<br />
In addition, the design of computer-based instruction (CBI)<br />
185<br />
.
must be accomplished with great care. In a classroom, there is<br />
usually an alert and experienced teacher to compensate for<br />
unclear or inadequate instructional presentations. In a computer<br />
setting, it is essential that the initial instructional be clear:<br />
otherwise, the instruction is likely to fail. Courseware is<br />
computer software that is designed for instructional purposes.<br />
Courseware that is not carefully designed is most likely to be<br />
expensive and ineffective (Jonassen, 1988). As a consequence, to<br />
make optimal use of CBI it will be necessary to develop<br />
techniques for evaluating the success and efficiency of various<br />
ISD methodologies applied in computer-based settings.<br />
Problem<br />
CBI has proven to be an appropriate instructional solution<br />
in many settings (Hannafin and Peck, 1988). CBI has also proven<br />
to be expensive and often ineffective (MacKnight 61 Balagopalan,<br />
1989). What is needed, then, is a means to insure that CBI<br />
course designs are effective and produced in a cost-effective<br />
manner.<br />
There are two aggravating factors to this problem: 1) It is<br />
often true that courseware developers have had no special<br />
training in computer-based methodologies, and 2) It is not<br />
completely clear what cognitive aspects of learning are best<br />
instructed using various computer-based methodologies. In short,<br />
in determining how to optimize CBI developments it will be<br />
necessary to determine how novice and experienced CBI developers<br />
interact with the courseware authoring environment, and it will<br />
also be necessary to evaluate the success of the resulting<br />
courseware.<br />
The methodology proposed below represents an attempt to<br />
build an initial model of CBI authoring that can eventually be<br />
used as a predictor of success when combining particular<br />
courseware authoring environments, CBI developers, subject<br />
matter, and student populations. The Air Force Human Resources<br />
Laboratory (AFHRL) is interested in refining this model in order<br />
to evaluate the usability of transaction shells (Merrill, Li, &<br />
Jones, 1990) in the Advanced Instructional Design Advisor (AIDA),<br />
an automated and integrated set of tools to facilitate and guide<br />
the process of developing effective courseware (Muraida t<br />
Spector, 1990).<br />
The AIDA project focuses on the design and development of<br />
CBI (Spector, 1990). It is assumed that the Air Force will<br />
continue to expand its use of CBI, that the Air Force will<br />
continue to experience a shortage of courseware authors with<br />
backgrounds in instructional technology, and that the subject<br />
matter of immediate interest is maintenance training for<br />
apprentice level maintenance personnel.<br />
To provide CBI design guidance consistent with these<br />
assumptions, AFHRL has decided to pursue the use of intelligent<br />
186
lesson templates. Intelligent lesson templates have preestablished<br />
instructional parameters and are executable upon<br />
input of informational content by a subject matter expert. In a<br />
sense, intelligent lesson templates l'know how" to present the<br />
kind of instruction they contain. Experienced instructors can<br />
alter the instructional parameters in order to customize<br />
instruction. The most noteworthy intelligent lesson templates<br />
are Merrill's transaction shells (Merrill et al., 1990).<br />
AFHRL and Merrill signed a Memorandum of Agreement wherein<br />
Merrill loaned two transaction shells to AFHRL for purposes of<br />
evaluation. AFHRL is using these transaction shells to develop a _<br />
model of CBI authoring interactions that affect the productivity<br />
and the quality of developed CBI courseware.<br />
Methodology<br />
The purpose of the initial evaluation study of Merrill's<br />
transaction shells was to develop a working model of user<br />
interactions with instructional design software. In addition to<br />
determining if Merrill's transaction shells with particular user<br />
interfaces were worthy of refinement and continued development,<br />
the aim was to establish an initial model with relevant<br />
characteristics that predict user success with other authoring<br />
environments.<br />
The answer to the question about the value of using<br />
transaction shell technology is that transaction shell technology<br />
appears to provide a very usable and productive courseware<br />
authoring environment. Details are elaborated in subsequent<br />
sections of this report.<br />
The primary question, however, concerned the establishment<br />
of a model of courseware authoring interactions that would<br />
influence the productivity and quality of a CBI authoring<br />
environment. Because all of the relevant characteristics were<br />
not known ahead of time, an approach that allowed iterative<br />
refinement of a quantifiable and predictive model was required.<br />
Falk's soft modeling technique satisfied this requirement and was<br />
used to guide the design of the study (Falk, 1987).<br />
The initial phase of developing a soft model consists of<br />
identifying inputs, processes, and outputs that are relevant to<br />
the task being modelled. Weighted links between input and<br />
process measurements and output measurements are then<br />
hypothesized. Additional subjects are then tested using the<br />
proposed tentative model. The model and its associated measures<br />
and weights are modified to reflect the outcome of new subjects.<br />
New input, process, or output measurements may be added as deemed<br />
necessary in the model development phase. Over time, the model<br />
stabilizes and can be used as a predictive or analytical tool.<br />
Initial input measures for this soft model included the<br />
187
following: instructional experience, subject matter experience,<br />
computer experience, and cognitive style. Some of this data was<br />
gathered by direct questioning and was easily quantified (e.g.,<br />
number of years of teaching experience). Cognitive style was<br />
determined by questioning and by observation, and was not as<br />
easily quantified. Some aspects of computer experience were<br />
easily determined by questioning (e.g., number of computer<br />
courses taken), but other aspects were not as straightforward<br />
(e.cbI level of expertise with an operating system).<br />
Initial process measures for this soft model included the<br />
following: time spent on an authoring event, sequence chosen for<br />
authoring events, number of revisions attempted and accomplished,<br />
and purpose of revisions. Again some of these processes were<br />
easy to measure and to quantify, but other processes were more<br />
difficult to assess. For example, it was easy to measure how<br />
long an author spent indicating the particular function of a<br />
device that was part of the lesson content. However, determining<br />
the purpose of a particular revision without interrupting the<br />
integrity of the authoring process was more difficult. The only<br />
way to accomplish this was to note that a revision had been made,<br />
look at the revision, if its purpose was not obvious (e.g.,<br />
correcting a misspelled word was an obvious revision), then the<br />
author was asked after the session about the purpose of the<br />
revision.<br />
Initial output measures for this soft model included the<br />
following: total time to produce the lesson module, total cost<br />
to produce the lesson module, student achievement on tests,<br />
retention, student motivation concerning the material, level of<br />
interactivity of the lesson, instructor motivation to use the<br />
authoring environment in the future, and peer review by other<br />
instructional developers. Once again some of these measures are<br />
direct and straightforwardly quantifiable (e.g., total<br />
development time, student scores, etc.), while some are indirect<br />
and more qualitative (e.g., instructor and student motivation).<br />
The initial subject was observed completing a lesson module<br />
to teaching the names, locations, and functions of 125 parts in<br />
the T-37 cockpit. Subject's experience was determined in an<br />
extensive interview prior to the study. Subject's motivation was<br />
observed throughout the study. In addition, the subject was<br />
queried midway through the study concerning his progress and<br />
problems encountered. The subject also kept a diary of authoring<br />
events, including problems encountered and general impressions.<br />
Results<br />
The relevant input measures of the subject were as follows:<br />
1) Medium instructional experience, 2) High subject matter<br />
experience, 3) Low computer experience, and 4) Reflective<br />
cognitive Style with a self-directed locus of control. A formula<br />
for connecting each of these factors with output measures is<br />
currently being developed and will be tested in the second<br />
_ _ ^ -..-.-.-<br />
188
iteration of the evaluation study.<br />
The relevant process measures were as fOllOWS: 1) 4.75<br />
hours in introductory exercises, 2) 14.25 hours in on-line<br />
authoring, 3) 11.83 hours in off-line design and planning, 4) 10<br />
groupings, nested 3 levels deep, with a total of 21 lesson<br />
modules, top level module completed first, teaching 125 parts, 5)<br />
20 picture files identified and utilized, with minor revisions<br />
requested for 4, 6) Approximately two minor revisions per module,<br />
7) Approximately 5 minutes of debugging per individual module,<br />
and 8) Complete linkage of all modules into a course module in 20<br />
minutes. This data was collected by observation. The software<br />
has since been modified to collect and record this data<br />
automatically (Canfield & Spector, 1990).<br />
The relevant output measures were as follows: 1) 30.83<br />
hours in total development time (graphics were produced by<br />
support personnel and graphic production time is not included),<br />
2) 3 plus hours expected for student instructional time, 3) cost<br />
data not available, 3) student scores and motivation not<br />
available, 4) medium level of interactivity, 5) high instructor<br />
motivation (wants to be included in follow-on studies, and 6)<br />
acceptable.guality of courseware (will be administered to cadets<br />
in lieu of current instruction).<br />
The subject's diary and responses to interview questions<br />
indicated a sustained high level of motivation and satisfaction<br />
with the authoring tool in spite of known deficiencies<br />
(occasional mouse failures). The subject experimented with<br />
default instructional parameters during the exercises but rarely<br />
changed the defaults for the instruction he developed. More<br />
specifically, the subject chose timed presentations for the<br />
student practice interaction rather than learner control. The<br />
subject also modified the default testing parameters to reflect 3<br />
samples per item instead of 2 and a criterion level of 75%<br />
instead of 90%. In addition, the subject altered allowable<br />
interactions per individual lesson as appropriate, which<br />
reflected complete understanding of the transaction shell<br />
environment.<br />
Conclusion<br />
This initial study prompted the addition of automatic data<br />
collection for both instructors and students to the transaction<br />
shell software. The general results indicate a high level of<br />
acceptability and productivity using transaction shells to author<br />
courseware. Assessment of the quality of the CBI produced has<br />
yet to be completed, although initial data collection on student<br />
performance is underway.<br />
Initial indications are that students require in excess of<br />
3 hours to complete the course module. This means that the<br />
subject's development time to instruction time ratio using this<br />
tool was approximately 1O:l. Using traditional authoring tools<br />
189<br />
_
for this type of material (ignoring the time to create graphics)<br />
woul _~.-d have involved a 2OO:l development to instruction time ratio<br />
(Lippert, 1989). Both the tool and the model are worth refining.<br />
References<br />
Andrews, D. H. & Goodson, L. A. (1980). A comparative analysis<br />
of models of instructional design. Journal of Instructional<br />
Desicrn, 3(4), 2-16.<br />
Falk, R. F. (1987). A Primer for Soft Modelinq. Berkeley, CA:<br />
University of California Institute for Human Development.<br />
Gagne, R. M. (1985). The Conditions of Learnins and Theory of<br />
Instruction. New York, NY: Holt, Rinehart, and Winston.<br />
Jonassen, D. H. (Ed.) (1988). Instructional Desians for<br />
Microcomputer Courseware. Hillsdale, NH: Lawrence Erlbaum<br />
Associates.<br />
Hannafin, M. J. & Peck, K. L. (1988). The Desisn, DeveloDment,<br />
and Evaluation of Instructional Software. New York: NY:<br />
Macmillan Publishing Company.<br />
Lippert, R. C. (1989). Expert systems: Tutors, tools, and<br />
tutees. Journal of Computer-Based Instruction, 16(l), ll-<br />
19.<br />
MacKnight, C. B. and Balagopalan, S. (1989). Authoring systems:<br />
Some instructional implications. Journal of Educational<br />
Technoloav Systems, 17(2), 123-134.<br />
Merrill, M. D., Li, Z., & Jones, M. C. (1990). Second generation<br />
instructional design (ID2). Educational Technoloav, 30(2),<br />
7-14.<br />
Muraida, D. J. & Spector J. M. (1990). The advanced<br />
instructional design advisor (AIDA): An Air Force project<br />
to improve instructional design. Educational Technolosv,<br />
30(3), 66.<br />
Muraida, D. J., Spector, J. M., & Dallman, B. E. (1990).<br />
Establishing instructional strategies for advanced<br />
interactive technologies. Proceedinss of the 12th Annual<br />
Psvcholosv in the DOD Symposium, 12(l), 347-351.<br />
Spector, J. M. (1990). Desianinq and Develonins an Advanced<br />
Instructional Desiqn Advisor (Technical Report AFHRL-TP-90-<br />
52). Brooks AFB, TX: Training Systems Division.<br />
Tennyson, R. D. (1989). Cosnitive Science Undate of<br />
Instructional Svstems Desian Models (AFHRL Contract NO. F-F-<br />
F3365-88-C-0003). Brooks AFB, TX: Training Systems<br />
Division.<br />
L.?-. -.- -. -.<br />
190
MILITARY TWTING ASSOCIATION<br />
iYH) Annuiil timkrcnw<br />
FORECASTING TRAINING EFFECTIVENESS (FORTE)<br />
Mark G. Pfeiffer and Richard M. Evans<br />
.--- .-- - -- -<br />
Naval Training Systems Cent.er and Training Ferformance Data C:ent.er<br />
Orlando, FL<br />
A model was developed to simulate a variety of aviation training device<br />
evaluation outcomes. This simulation model is designed to explore sources<br />
of error threatening the sensitivity of device evaluations. Selection of ._<br />
evaluation designs is guided by a model that elicits information from<br />
experienced flight instructors. This practical knowledge is transformed<br />
into data that are used i,n simulating a training effectiveness evaluation.<br />
Effects of variables such as instructor leniency, task difficulty, and<br />
student ability are estimated by two different methods. Available in the<br />
output is an estimate of transfer ratios based on trials-to-mastery, a<br />
diagnosis of deficiencies, an exploration of possible sources of variance,<br />
and an estimate of statistical power and required sample size. Finally, all<br />
data analyses can be accomplished in less than 2 man-days and prior to the<br />
actual field experiment. Estimates of accuracy, reliability, and validity<br />
of the model are high and in an acceptable range.<br />
Backwound<br />
Major sources of error variance that can mask the true contribution of<br />
a training device to training effectiveness include instructor leniency,<br />
student ability, and task difficulty (McDaniel, Scott & Browning, 1983).<br />
First, instructors' grades are often unreliable criterion measures. Next,<br />
individual abilities among students vary widely. Finally, tasks vary<br />
greatly in difficulty level. Some tasks can be mastered by students in one<br />
or two trials, while others may require 30 trials. These sources of<br />
variance make ratings of students' performance insensitive measures of<br />
training device effectiveness. However, the magnitude can be identified<br />
with sensitivity analysis prior to actual field experiments.<br />
Sensitivity Analysis<br />
Sensitivity analysis is a planning technique (Lipsey, 1983) which<br />
focuses on the impact of variance on variables of interest. The device<br />
evaluation must be carefully planned if the results are to have practical<br />
value and show a true difference between experimental and control groups.<br />
During the planning phase for device evaluations an investment in time may<br />
help identify the problems that introduce unwanted error variance into the<br />
device evaluation. Performance data qenerated by flight instructors can be<br />
used for this purpose.<br />
The basic framework of the present "sensitivity" analysis differs from<br />
that described by Lipsey (1983) in that it employs the "insensitive"<br />
instructor's rating of students as a performance measure. Lipsey .wouid<br />
rather seek a more sensitive measure. While this rating measure may not be<br />
a particularly good psychometric measure, it is dictated by operational<br />
constraints. Instructors' ratings are used extensively in the transfer of<br />
training literature.<br />
“Approved for public release; distribution is<br />
unlimited.”<br />
191
SIMJLATION UOOEL<br />
The model described here is designed to simulate experimental dnd<br />
quasi-experimental training effectiveness evaluations of aviation devices.<br />
Values are generated by training experts. Major features of the model<br />
include the following:<br />
. programmable for microcomputers<br />
. extendable to different transfer designs /<br />
. helpful in planning field experimental and quasi-experimental .<br />
evaluations of devices<br />
. possible data collection by computer or by questionnaire.<br />
Input to the model comes from the ratings made by flight instructors. These<br />
expert judges make estimates of trials-to-mastery needed in the airplane by<br />
replacement pilots with and without prior simulator training using different<br />
device features. Estimates are made by two different methods to permit a<br />
check on cross-method variance and rater reliability.<br />
VARIABLES<br />
In order to gain a perspective of the scope or size of the model it is<br />
helpful for the reader to examine the levels permitted for key variables.<br />
These are shown in table 1. The model is designed so that these 1 imits can<br />
be changed to fit a variety of evaluation designs (Pfeiffer & Browning,<br />
1984).<br />
Variable<br />
Treatment (Xl)<br />
(Experimental vs. Control)<br />
Student Ability (X2)<br />
(Fast-Average-Slow)<br />
Task Difficulty (X3)<br />
(Easy-Average-Tough)<br />
Instructor Leniency (X4)<br />
(Easy-Average-Tough)<br />
Table 1<br />
Model Limits<br />
192<br />
Levels Permitted<br />
I<br />
/<br />
I
DATA INPUT<br />
Two methods are provided for entering data: the interactive method and<br />
the additive method. The data from both interactive and additive methods<br />
are compatible with the following evaluation design: (Xl) treatment, (;:I<br />
student ability, (X3) task difficulty, and (X4) instructor leniency.<br />
combinations of two levels for Xl and three levels for X2, X3, and X4<br />
require 54 data elements.<br />
Interactive Method<br />
An expert is asked to estimate the trials required for a replacement<br />
pilot to achieve mastery in the aircraft for each set of training conditions<br />
listed in table 2. These estimates are made twice: first, for the<br />
experimental group (e.g., with prior simulator training) and second for the<br />
control group (e.g., without simulator training). Training conditions and<br />
the data collection instrument for the interactive method are illustrated<br />
below as table 2.<br />
Table 2<br />
Interactive Questionnaire Instrument for Estimating Trials-to-Mastery<br />
ESTIMATED<br />
CONDITION INSTRUCTOR STUDENT TASK TRIALS<br />
1 Easy Fast Easy<br />
2 Easy Fast Tough<br />
3 Easy Slow Easy ,_<br />
4 Tough Fast Easy<br />
5 Easy Slow Tough<br />
6 Tough Fast Tough<br />
7 Tough Slow Easy<br />
8 Tough Slow Tough<br />
The model calls for data on trials-to-mastery for the 27 combinations of<br />
conditions describing the experimental group and the 27 combinations of<br />
conditions describing the control group, a total of 54 conditions. Training<br />
experts need only estimate trials for eight conditions in each group, a<br />
total of 16 conditions. The remaining 38 values (representing the<br />
difference between 16 and 54) are estimated by a regression subroutine in<br />
the model.<br />
193<br />
.
Valuable time of experts is saved by having the model compute intermediate<br />
data elements.<br />
Paramters. The parameters identified in table 3 were selected to make<br />
the model flexible, i.e., capable of simulating conditions where the<br />
relative importance of the variables listed can be changed at will by the<br />
analyst. By using a computer terminal, the analyst may input alternative A,<br />
8, C, D, E, or F to establish the relative importance of the variables in<br />
determining expected trials-to-mastery. Relative importance of these<br />
variables is expected to vary from one aircraft community to another.<br />
,<br />
Table 3<br />
Parameters for Weighting Trials-to-Mastery<br />
Parameter Relative Importance<br />
Addftive Method<br />
A Instructors Students Tasks<br />
8 -Students Instructors Tasks<br />
Tasks Instructors Students<br />
s Instructors Tasks Students<br />
Students Tasks Instructors<br />
s Tasks Students Instructors<br />
The mean trials-to-mastery for the experimental and control groups,<br />
obtained by the interactive method, are used as a basis for the values used<br />
in the the additive method. Here the same expert is asked to estimate<br />
trials-to-mastery for each of the conditions one at a time. The questions<br />
are phrased as deviations around the mean trials-to-mastery (table 4).<br />
Training experts estimate six conditions in each group, a total of 12<br />
conditions. The remaining 42 values (representing the difference between 12<br />
and 54) are estimated by the computer model according to the rules of<br />
additive conjoint measurement (Lute & Tukey, 1964).<br />
Reljabilitv Check<br />
Since each training expert is asked for inputs to the model by two<br />
different methods, a check on methodological variance is possible by<br />
correlating the values obtained by the interactive and additive methods (N =<br />
54). This correlation fs computed across methods for experimental and<br />
control groups.<br />
SUMMtYOFMOOEL FLOU<br />
Input, output, and<br />
figure 1 and figure 2.<br />
interactive aspects of the mode1 are summarized in<br />
194<br />
4
10 WE 1ASI<br />
YlslRUc70n IS<br />
PROMPlED 10 SELECT-<br />
,<br />
INTERACllVE METHO<br />
t<br />
i<br />
EXPERIMENTAL GROUP<br />
I<br />
I<br />
I<br />
I<br />
TO BaSTER”<br />
CONTROL GROUP<br />
ADDITIVE MET”OD<br />
Figure 1. Model flow and data eethating procedure.<br />
ANALISIS OF EXPERWENT*L<br />
A”0 CONTROL OUWPS-<br />
. DlFFERENCLS<br />
. CORRELAllONS<br />
. TAAHIFER R*noS<br />
I<br />
,<br />
Flgure 2. Anelysie at � xperlmental and con?ruf group8 and data storage.<br />
195<br />
1
Table 4<br />
Additive Questionnaire Instrument for Estimating Trials-to-Mastery<br />
IF AN AVERAGE STUDENT REQUIRES *N* TRIALS TO LEARN TO<br />
MASTERY, HOW MANY TRIALS WILL A . . . FAST LEARNER REQUIRE?<br />
. . . SLOW LEARNER REQUIRE?<br />
IF AN AVERAGE INSTRUCTOR REQUIRES *N* TRIALS TO TRAIN<br />
STUDENTS, HOW MANY TRIALS WILL . . . AN EASY INSTRUCTOR NEED?<br />
.., A TOUGH INSTRUCTOR NEED?<br />
IF *N* TRIALS ARE NEEDED FOR AVERAGE TASKS, HOW MANY<br />
TRIALS WOULD... . . . AN EASY TASK REQUIRE?<br />
. . . A TOUGH TASK REQUIRE?<br />
VALIDATION AND APPLICATION<br />
The model was validated in the helicopter community using a concurrent<br />
validation design. Criterion data for the simulation were collected during<br />
an experimental evaluation of Device 2F64C, an SH-3 simulator located at the<br />
Naval Air Station, Jacksonville, Florida. Trials-to-mastery obtained from<br />
the simulation model were compared with the trials-to-mastery obtained from<br />
the field experiment (Evans, Scott & Pfeiffer, 1984).<br />
SUBJECTS AND PROCEDURE<br />
Thirteen flight instructors currently involved in training pilots in<br />
Device 2F64C were asked to estimate trials-to-mastery by two different<br />
methods. The subjects, one at a time, made their estimates at a computer<br />
terminal. One half-hour per subject was required to complete both the<br />
additive and interactive rating tasks.<br />
VARIABLES<br />
Four independent variables (shown following ) were included in the<br />
validation desi n: (XI) device feature, (X2) student ability, (X3) task<br />
difficulty and PX4)<br />
instructor leniency. All combinations of two levels for<br />
Xl and three levels for X2, X3, and X4 produced 54 data points for a<br />
re ression analysis against estimated trials-to-mastery. Trials-to-mastery<br />
(Yy in the<br />
aircraft was the dependent variable.<br />
EVALUATION DESIW SE#SITIVIM<br />
The usual purpose of a device feature evaluation is to extract the<br />
variance due to the device features, e.g., visual and motion vs. motion<br />
onlY.The modeled data can also be used to do a power analysis of the onetrial<br />
difference (actually A = 1.04) between device features. Power<br />
analysis provdes an estimate of the sample sizes needed to demonstrate that<br />
this one-trial difference (experimental mean = 4.61, SD = 1.83: control<br />
mean = 5.65, SD = 2.07) is reliable (pfeiffer, Evans & Ford, 1985).<br />
196
The linear model indicates that the smallest amount of variance is<br />
accounted for by device features (.07). The combined other sources of<br />
variance: Instructor leniency, student ability, and task difficulty,<br />
(.21+.27+.42*.90) are predicted to mask out the variance due to the device<br />
features. Evaluators could also artificially change their ratings to<br />
reflect the impact of anticipated evaluation design changes. A<br />
reexamination of summary statistics would permit evaluators to assess the<br />
impact of hypothetical design modifications on the anticipated outcome of<br />
the device evaluation.<br />
DISCUSSION<br />
Using data from a simulation model, the training effectiveness analysis<br />
estimated that the one-trial difference between training under the visual<br />
plus motion condition and motion alone would not be statistically<br />
significant with a reasonable sample size (Ott, 1977). This outcome ofothe<br />
model was confirmed through analysis of actual field data (Evans, Scott &<br />
Pfeiffer, 1984). With this insight, from the model, the evaluator of a<br />
.device would know in advance that control of task difficulty, student<br />
ability, and instructor leniency in a field experiment would be necessary to<br />
increase statistical power. True training effects attributable to the<br />
device features are more likely to be revealed when extraneous errors are<br />
controlled. Cochran and Cox (1957) have presented a theoretical discussion<br />
of this problem. Instructors' rating variance, for example, may be<br />
controlled by utilizing a standardized method for identifying when the<br />
student has achieved mastery (Rankin & McDaniel, 1980). Some criterion<br />
measure other than instructors' ratings could also be employed. A specific<br />
example is automated performance measurement on the tactical range, which<br />
unfortunately is not widely available for scientific measurement of ai,rcraft<br />
in free flight. However, performance measurement is available in flight<br />
simulators. Computer-aided techniques for providing operator performance<br />
measures have been provided by Connelly, Bourne, Loental and Knoop (1974).<br />
coNausION<br />
This study shows that flight instructors who have knowledge of a<br />
training situation . but who are not necessarily proficient with the<br />
intricacies of research design and statistics can provide data useful for<br />
planning a field experiment (device evaluation). The programs described<br />
herein are "user-friendly" and resident in a portable microcomputer. Should<br />
the computer be unavailable, a questionnaire could be used (Appendix). The<br />
utility of this approach depends, in part, on asking the right questions for<br />
a particular training environment and in part on developing the responses to<br />
such questions into meaningful information. The model just described has<br />
provided that utility for the present situation. Additionally, this model<br />
may be easily adapted to other training problems involving expert ratings(see<br />
Pf'eiffer and Horey, 1988).<br />
197<br />
_
REFERENCES<br />
Cochran, W. G., & COX, G. M. (1957). Exoerimental designs. New<br />
York: John Wiley.<br />
Connelly, E. M., Bourne, F. J., Loental, D. G., & Enopp, P. A.<br />
(1974). Comnuter-aided techniuues for orovidina operator<br />
performance measures. (AFHRL-TR-74 87). Dayton, OH:<br />
Wright-Patterson Air Force Base.<br />
Dawes, R. M. (1979). The robust beauty of improper linear models<br />
in decision making. American DSvChOlOUiSt, 34, 571-582:<br />
Evans, R. M., Scott, P. G., & Pfeiffer, M. G. (1984). SH-3<br />
helicoDter fliaht traininq: An evaluation of visual and<br />
motion simulation in Device 2F64C. (Technical Report 161).<br />
Orlando: Training Analysis and Evaluation Group, Naval<br />
Training Equipment Center.<br />
Lipsey, M. W. (1983). A scheme for assessing measurement<br />
sensitivity in program evaluation and other applied research.<br />
Psvcholoaical Bulletin, 94, 152-165.<br />
Lute, R. D., SI Tukey, J. W. (1964). Simultaneous conjoint<br />
measurement: A new type of fundamental measurement. Journal<br />
of Mathematical Psvcholoav, 1, l-27.<br />
McDaniel, W. C., Scott, P. G., t Browning, R. F. (1983).<br />
Contribution of nlatform motion simulation in SH-3 helicoDter<br />
pilot traininq. (Technical Report 153). Orlando: Training<br />
Analysis and Evaluation Group, Naval Training Equipment<br />
Center.<br />
Ott, L. (1977). An introduction to statistical methods and data<br />
analysis. North Scituate, MA: Duxbury Press.<br />
Pfeiffer, M. G., & Browning, R. F. (1984). Field evaluations of<br />
aviation trainers. (Technical Report 157). Orlando:<br />
Training Analysis and Evaluation Group, Naval Training<br />
Equipment Center.<br />
Pfeiffer, M. G., Evans, R. M., & Ford, L. H. (1985). Modelinq<br />
field evaluations of aviation trainers. (Technical Note l-85.<br />
Orlando: Training Analysis and Evaluation Group, Naval<br />
Training Equipment Center.<br />
Pfeiffer, M. G., t Horey, J. D. (1988). Forecastinu traininq<br />
device effectiveness: Three devices. (Technical Report<br />
88-028). Orlando: Naval Training Systems Center.<br />
Rankin, W. C., & McDaniel, w. c. (1980). Camouter aided traininq<br />
evaluation and scheduling (CATES) system: Assessinu fliuht<br />
task proficiency. (Technical Report 94). Orlando: Training<br />
Analysis and Evaluation Group, Naval Training Equipment<br />
Center.<br />
,...-.” .._<br />
198
Cost-Effectiveness of Home Study using Asynchronous<br />
Computer Conferencing for Reserve Component Trainings2<br />
Ruth H. Phelps, Ph.D.<br />
Major Robert L. Ashworth, Jr.<br />
U.S. Army Research Institute for the<br />
Behavioral and Social Sciences<br />
Heidi A. Hahn, Ph.D.<br />
Idaho National Engineering Laboratory<br />
Abstract<br />
The resident U.S. Army Engineer Officer Advance<br />
Course was converted for home study via asynchronous<br />
computer conferencing (ACC). Students and instuctors<br />
communicated with each other using computers at home,<br />
thus creating an ttelectronic classroom". Test scores,<br />
completion rates, student perceptions and costs were<br />
compared to resident training. Results showed that:<br />
ACC performance is equal to resident and costs are less<br />
than resident.<br />
Geographical dispersion, limited training time and civilian<br />
job and family demands make travel to resident schools for<br />
training and education difficult for the Reserve Component (RC).<br />
Not only is it a hardship for soldiers to leave jobs and family,<br />
but their units are unable to conduct collective training when<br />
soldiers are absent. In addition, training soldiers at resident<br />
schools has become so costly that HQ TRADOC has proposed a 50%<br />
reduction in the number of soldiers traveling to resident<br />
training by 2007 (TRADOC PAM 350-4).<br />
The purpose of this paper is to summarize an investigation<br />
of an alternative means for meeting the educational requirements<br />
of the RC. The goals are to (1) develop and test a new training<br />
option, using asynchronous computer conferencing (ACC), that<br />
1 These data are summarized from Hahn, H., Ashworth, R.,<br />
Wells, R., Daveline, K., (in preparation). Asynchronous<br />
Cornouter Conferencinq for Remote Delivery of Reserve<br />
Comnonent Traininq (Research Report). Alexandria, VA: U.S.<br />
Army Research Institute for the Behavioral and Social<br />
Sciences.<br />
2This paper is not to be construed as an official Department<br />
of the Army document in its present form.<br />
199
would not require soldiers to leave their homes and units and<br />
yet maintain the quality of training typically found at the<br />
branch school: (2) determine the cost-effectiveness of<br />
developing and operating the ACC alternative.<br />
Asynchronous computer conferencing is a means for<br />
communicating from different locations at different times (i.e.,<br />
asynchronously) using a computer network. For training<br />
purposes, an llelectronic classroom*‘ is established by connecting<br />
all students with each other and the instructional staff. A<br />
student or instructor can participate in the classroom from any<br />
location using existing telephone lines and a computerequipped<br />
with a modem. Students can work together in groups, ask<br />
questions of the instructors, tutor their classmates or share<br />
their thoughts and experiences. Instructors can direct<br />
individual study, conduct small group instruction, answer<br />
questions, give remedial instruction and provide exam feedback<br />
to the students.<br />
Participants<br />
Method<br />
Fourteen RC officers (13 males: 1 female) took Phase III of<br />
the Engineer Officer Advanced Course (EOAC) by ACC homestudy.<br />
For comparison purposes, performance data were collected from<br />
RC students taking the same course in residence at the U.S. Army<br />
Engineer School from October, 1986 to June, 1989.<br />
The instructional staff consisted of a civilian full-time<br />
course manager/administrator responsible for the overall<br />
operation of the course and three part-time instructors. The<br />
part-time instructor responsibilities included directing group<br />
discussions, remedial instruction and/or monitoring student<br />
progress.<br />
Course Descriotion<br />
Course materials consisted of Module 6 of the EOAC (66<br />
program hours of instruction). Media used included paper based<br />
readings and problems, computer-aided instruction, video tapes<br />
and computer conferencing discussion. Topics covered were Army<br />
doctrine (e.g., rear operations), technical engineering (e.g.,<br />
bridging, flexible pavements), leadership and presentation<br />
skills. The program of instruction was identical for the ACC<br />
and resident classes.<br />
200<br />
.
- ---..<br />
Eouinment, Procedure and Data Analysis<br />
- -<br />
Each student was provided with an IBM XT computer with 20<br />
megabyte hard disk, color monitor and printer. Software and<br />
courseware loaded on each computer consisted of: (1) a<br />
specially developed course management system and communications<br />
package; (2) computed-assisted instruction and tests; (3) word<br />
processing package; (4) spreadsheet.<br />
Communication software for asynchronous computer<br />
conferencing was provided through U.S. Army Forum, Office of the<br />
Director of the Army Staff. The host computer was located at<br />
Wayne State University and used the CONFER II conferencing<br />
software system. I<br />
The course was conducted from September, 1988 to April,<br />
1989. Students were mailed all their computer equipment with<br />
written assembly and operation instructions and course<br />
materials. In addition they were provided with a toll free<br />
"hot line" telephone number for resolving hardware/software<br />
problems. The first lessons to be completed were self-conducted<br />
and designed to familiarize the student with the operation of<br />
the computer and software. Scores for computer training were<br />
not included in overall course grades.<br />
Part-time instructional staff were provided the same<br />
equipment and software as the students. In addition they were<br />
given a 40 hour training course on operating the hardware/<br />
software, instructional responsibilities and<br />
teaching/motivational techniques. Instructional staff and<br />
researchers met together to conduct this training using a<br />
combination of lecture and hands-on practice with the computer.<br />
There were four types of data collected: (1) test,<br />
practical exercise and homework scores: (2) pre- and post course<br />
student perceptions of their amount of knowledge on the course<br />
topics: (3) course completion: (4) cost of converting and<br />
executing the course. Comparisons of the resident to the ACC<br />
course were made using multivariate analysis variance procedures<br />
for a two-group design.<br />
Results<br />
As shown in the top of Table 1, there was no reliable<br />
difference between the test scores of students in residence<br />
versus ACC. A comparison of the students' self ratings of their<br />
level of knowledge before and after the course, showed that the<br />
ACC group had significantly greater gains in their perceived<br />
amount of learning, as shown in the bottom of Table 1.<br />
Completion data showed that 95% of resident students completed<br />
the course compared to 64% of the ACC students.<br />
201<br />
-.
Table 1<br />
Student Scores and Ratinqs<br />
Scores Resident Sisnificance<br />
Tests<br />
Homework<br />
Practical<br />
Exercise<br />
92.0% 86.4% NS<br />
88.8% 92.0% NS<br />
90.4% 89.9% NS<br />
Perceived Amount 33% 12% PC.05<br />
Learned<br />
(% Post-Pre)<br />
Cost data were computed separately for (1) converting an<br />
existing course for delivery by ACC and (2) executing each<br />
iteration of the course. If the conversion were done<br />
by within-government staff, then the cost would be approximately<br />
$296,100. If it were done under contract, then the cost is<br />
estimated at $516,200. Start-up costs of equipment purchase and<br />
instructor training were estimated to be $73,100 for withingovernment<br />
and $96,000 for contractor. Costs that will recur<br />
with each iteration were estimated at $234,400 for withingovernment<br />
and $420,900 for contractor.<br />
SK<br />
so00<br />
5500<br />
5000<br />
4soo<br />
4cQo<br />
3500<br />
3004<br />
2500<br />
2000<br />
1SW<br />
loo0<br />
so0<br />
0<br />
Fiaure 1. Relative costs of EOAC alternatives over 10 course<br />
iterations.<br />
202
Figure 1 shows the total course conversion, start-up plus<br />
the recurring costs over 10 course iterations. Initially<br />
resident and ACC (within government) are similar with ACC<br />
(contractor) costs being nearly twice as much. However, when<br />
'the costs of conversion and execution are amortized, ACC<br />
(contractor) becomes less costly than resident training after<br />
four course iterations. After five iterations ACC (within<br />
government) would save 47% and ACC (contractor) would save 6%.<br />
Cost-effectiveness ratios were computed by combining the<br />
cost and completion rate data. The ratio was greatest for ACC<br />
using government staff (.64), second for resident training . . .<br />
(.41), and lowest for ACC using contractor staff (.36).<br />
‘% Discussion<br />
It has been shown in this report that there is a costeffective<br />
alternative to sending RC soldiers to branch schools<br />
for resident training. Training by ACC can be conducted just as<br />
effectively and for less money. Thus, this technology appears<br />
to meet the need of the RC to complete educational requirements<br />
from the home or homestation, without long absences from the<br />
unit. The llelectronic classroom" could be conducted remotely<br />
from existing educational institutions such as the branch school<br />
and/or the U.S. Army Reserve Forces School in order to maintain<br />
standardized instruction.<br />
Additional research is needed, however, to improve the<br />
completion rate for ACC home study. Reasons for dropping out of<br />
the experimental course were related to limited time due to<br />
competing activities such as civilian jobs and family. A means<br />
of predicting which soldiers are likely to succeed or drop out<br />
of home study will assist Army trainers in both selecting<br />
students and providing assistance for those at high risk.<br />
References<br />
U.S. Army Training and Doctrine Command. (1989). Army Training<br />
2007. (TRADOC Pamphlet 350-4). Ft. Monroe, VA: Author.<br />
203
TEST DESIGN AND MINIMUM CUTOFF SCORES<br />
Sandra Ann Rudolph, Training Appraisal<br />
Chief of Naval Technical Training<br />
INTRODUCTION<br />
It has become increasing obvious in the last few years that<br />
the United States government cannot continue to operate with little<br />
concern for who will pay the bill. The apparent message is to<br />
do better with less. This means we must become more efficient in<br />
our way of conducting business. For many of us--our business,<br />
is training. Being efficient means we must use our resources<br />
wisely for the purpose intended. In training our resources are<br />
numerous--training devices, curriculum, instructors --while our<br />
purpose is solitary--provide the training necessary for graduates<br />
to perform in the fleet. While performance is the key, there is<br />
background knowledge that is necessary for the trainee to grasp the<br />
performance.<br />
BACKGROUND<br />
In the training environment of yesterday, where money was no<br />
object, training was easier. There was little concern for statistical<br />
evaluation, effectiveness, or efficiency. We trained by<br />
the seat of our pants--experience wasn't the best teacher, it was<br />
the ONLY teacher. Today, lack of attention in these areas could<br />
mean loss of training dollars. One of the big areas of concern<br />
deals with attrition--or the dropping of trainees from,a designated<br />
training program. While there are many causes for attrition,<br />
recent attrition analysis visits to such schools as Air Traffic<br />
Control School, Music School, and Boiler Technician /Machinist Mate<br />
School, indicate that testing programs may be at the very core of<br />
many of our problems. The following questions were used to<br />
determine how knowledge testing was being used to measure success:<br />
(1) Have critical course objectives been identified with<br />
corresponding emphasis on testing?<br />
(2) Have the knowledge tests been designed to measure the<br />
objectives to the learning level required?<br />
(3) How was the minimum cutoff score for the knowledge<br />
tests determined?<br />
(4) Has the test design and cutoff score been validated?<br />
(5) Have alternate versions of the tests been developed<br />
that are consistent with the valid test design?<br />
It became apparent that testing was a problem. It was<br />
discovered that the emphasis and training had been placed on<br />
individual test-item development and test-item analysis, not on<br />
test development and test analysis. In other words, there was no<br />
assurance that the objectives were being tested nor any evidence<br />
on how the cutoff score was determined. To standardize the<br />
approach to test design, the following process was established:<br />
204
(1) Determine criticality of the objectives.<br />
(2) Determine test design.<br />
(3) Establish a minimum cutoff score.<br />
(4) Validate the test.<br />
Criticalitv of the obiectives<br />
DISCUSSION<br />
The objectives of a course are those behaviors the trainee is<br />
expected to exhibit upon completion of training. Regardless of the<br />
method of development, objectives are established with var.ying ._<br />
degrees of importance or criticality. Therefore determining the<br />
importance of the objectives must occur prior to designing the<br />
test. While there is not an established set of procedures to<br />
determine criticality, 'the following examples have proven to be<br />
valid.<br />
(1) Rank orderins of obiectives. Subject matter experts rank<br />
the objectives from the most important to least important. This<br />
method is most useful when courses have a few number of objectives.<br />
(2) Yes or No. Subject matter experts determine critical-ity<br />
by responding ltYesl' or *rN~ll. The greatest disadvantage to this<br />
approach is that some critical objectives are more critical the<br />
others and visa versa.<br />
(3) Criticality based on a scale rankinu. This method uses<br />
a set of questions to guide in determining criticality.<br />
(a) How important is this behavior to successful<br />
performance in the fleet?<br />
(b) How difficult is the behavior to learn?<br />
(c) How important is the behavior to successful<br />
performance in the course?<br />
A scale is normally established as O-5 or O-10. Based on the<br />
above or similiar questions, each objective is reviewed by subject<br />
matter experts; a number value assigned and the average calculated.<br />
The objectives are then ranked. Objectives falling above the<br />
established cutoff are considered critical. The cutoff score is<br />
normally a number based on the scale used. For example, any<br />
objective ranked 3 or above on a O-5 scale might be considered<br />
critical. This number will vary and is based on the individual<br />
course and its mission. This method provides the most complete way<br />
to determine criticality. The disadvantage is that<br />
it may be complicated and time consuming.<br />
205
Test Design<br />
As with any research project, the researcher must have a plan.<br />
Without this plan, the researcher would be looking for information<br />
with little or no direction. The test design is a plan for<br />
ensuring that the objectives are tested and a plan for measuring<br />
the student's success in accomplishing the objectives.<br />
The process of designing a test builds upon the previous step<br />
of determining criticality of the objectives.<br />
There is no proven scientific method to determine the exact .<br />
test design. 'It is an opinion based on experience. This opinion<br />
can be strengthen through consensus. Therefore the design must be<br />
based on the ideas of several subject matter experts and not one or<br />
two individuals. If a consensus cannot be reached, then an average<br />
should be taken. Consensus should be an underlying concern<br />
throughout the test design process. Consensus of the right persons<br />
improves the chances of producing a valid test.<br />
Sten One. Group the objectives in the order in which they<br />
will be tested. Factors to consider are:<br />
(1) The difficulty of the material needed to accomplish the<br />
objective.<br />
(2) The length of the material needed to accomplish the<br />
objective.<br />
For more difficult material, fewer objectives should be<br />
grouped for testing purposes. For example, an objective that is<br />
very difficult to accomplish may require individual testing, while<br />
several simpler objectives may be tested together. The longer it<br />
takes to teach the objective, the fewer objectives should be tested<br />
together. For example, an objective that is taught in three days<br />
may ,require individual testing while the objective that is taught<br />
in three periods may be tested with other objectives.<br />
Step Two. Determine the number of test items per objective.<br />
The concern is to have enough test items on a test to ensure the<br />
measurement of each objective. Several factors to consider are:<br />
(1) Criticalitv of the obiective. The more critical the<br />
objective, the more items may be required.<br />
(2) Tvne of obiective tested. If the test is comprised of<br />
both critical and noncritical objectives, normally the critical<br />
objectives should contain more items. The more items asked, the<br />
more confident that the trainee has grasped the objective.<br />
(3) Number of obiectives tested. If the test contains<br />
several objectives, be aware of total number of items on the test<br />
and the time constraints.<br />
206
(4) Length of the material tested. If an objective can<br />
be taught in three periods, it should require fewer test items<br />
than the objective that is taught in three days.<br />
(5) Difficultv of the material. When the material is very<br />
difficult it may require fewer items written to a much m o r e<br />
difficult level.<br />
Step Three. Determine the level of learning of the test<br />
items. Depending on the status of the curriculum, test items may<br />
or may not already be available. While several levels of learning<br />
exist, the following five levels are suggested for use: . _<br />
(1) Knowledcte. Test items that measures a student's<br />
ability to identify or recall specific terms, facts, rules, etc.<br />
as they are taught. Knowledge represents the lowest level of<br />
learning for a test item.<br />
(2) Comprehension. Test items that measure a student's<br />
ability to grasp the meaning of material. This may be done by<br />
interpreting, explaining, or translating information. This is a<br />
higher level of learning than knowledge, but the lowest level of<br />
understanding.<br />
(3) Amlication. Tests items that measure the student's<br />
ability to use learned material in new and concrete situations.<br />
This type of test item requires a higher level of understanding<br />
than comprehension.<br />
(4) Analysis. Test items that measure the student's<br />
ability to break down material into components so that an<br />
organizational structure may be understood. This may require the<br />
identifcation of parts, analysis of relationships between parts,<br />
and recognition of the organizational principles involved. These<br />
types of test items represents a higher level of learning than<br />
comprehension and application because they require an<br />
understanding of both the content and the structural form of the<br />
material.<br />
(5) Evaluation. Test items that measure the student's<br />
ability to judge the value of material for a given purpose. The<br />
judgements are based on definite criteria. This type of test<br />
item represents the highest learning level because it contains<br />
all the other categories.<br />
When determining the learning level that the test item<br />
should be written to, the objective must be reflected. The<br />
following factors should be considered:<br />
(1) Test items must be written that support the objective.<br />
This means that if the objective calls for a basic knowledge of<br />
the material, the test items should be written to the knowledge<br />
learning level.<br />
207
- __-.-. -.. .~ .-... _ .._...<br />
‘(2) If the objective calls for an understanding of the<br />
material, then the test item should be written to one of the<br />
higher learning levels.<br />
(3) If the objective calls for a higher learning level<br />
not all test items should be written to the highest level.<br />
Enough must be on the test to ensure that the student has met the<br />
objective to the learning level required.<br />
Step Four. Select appropriate test items from the test bank<br />
or develop test items. If a test bank is already in existence,<br />
each item must be cross-referenced to the objective it<br />
supports and a level of learning identified.<br />
If the test bank<br />
does not have an adequate number of items, new items may be<br />
required. If it appears that new items that support the objective<br />
are difficult to prepare, the plan may need to be altered.<br />
Stels Five. Establish a minimum cutoff score. Setting a cutoff<br />
score means that a point must be determined that differentiates<br />
between the student that has achieved the objective and the student<br />
that has not. If the first four steps have been followed, it is<br />
safe to assume that the test has content validity. If there is any<br />
doubt, the test should be reviewed again before attempting to<br />
establish the minimum cutoff score.<br />
Setting a cutoff score, as with the other steps is a judgemental<br />
process. While several methods of establishing the minimum<br />
cutoff score exist, the following methods are suggested.<br />
METHOD 1<br />
(1) A panel of subject matter experts are selected based on<br />
their current knowledge of the job and the performance required<br />
of the graduate in the fleet.<br />
(2) A discussion should be conducted centering around what<br />
is the minimally competent person. Caution should be taken not to<br />
allow one person to dominate the discussion and that the goal<br />
should be one of consensus. The discussion is designed to get all<br />
members thinking along the same lines.<br />
(3) Next, the technique of establishing the cutoff score<br />
should be explained.<br />
(a) Review each test item on the test.<br />
(b) Check items that the student with minimum<br />
competency should be expected to know. Care<br />
should be taken that this is not what the average<br />
student will know, or what the subject matter<br />
expert would like for them to know.<br />
(c) If there are any items that the student must know,<br />
these items will be noted.<br />
(d) Add the number of checks for each objective.<br />
208<br />
.
(e) Average the total responses and this becomes the<br />
minimum cutoff score for the objective.<br />
METHOD 2<br />
(a) With this method, subject matter experts determine<br />
the percentage of the students that should answer<br />
the test item correctly. Again this is dealing<br />
with minimum competency.<br />
(b) An average of the percentages yields the minimum<br />
cutoff score.<br />
Regardless of the method used, there is never any hard and<br />
firm criteria for what is competency and what it is not. Some<br />
students are clearly competent based on their scores. Some<br />
students are clearly not competent based on their scores. There<br />
is a certain group of students that may meet the cutoff score and<br />
not be competent. There is normally an equal number of students<br />
that do not meet the cutoff score that are competent.<br />
In the final analysis of the cutoff score, it comes to a<br />
decision concerning which is the greater danger; to fail a<br />
qualified person or to pass an unqualified person. For progress<br />
tests, it is probably alright to pass an unqualified person. For<br />
exit exams, particularly when safety is a factor, it is probably<br />
better to fail a qualified person than to pass an unqualified<br />
person.<br />
Step Six. Validation process. Content validity has already<br />
been established. Validating the minimum cutoff score is a process<br />
achieved over time by administering the test and plotting the<br />
scores. If the scores indicate that most all students are passing,<br />
the cutoff score may be too low. This is true only if non<br />
competent students are passing. If all the students who pass are<br />
competent, then the cutoff score may be acceptable. If the scores<br />
indicate that most students fail, the cutoff score may be too high.<br />
SUMMARY<br />
In conclusion, the process is being tested. Training has been<br />
provided to all the sites where attrition analysis visits have been<br />
conducted. Since the training is recent, it is difficult to assess<br />
what impact the process has had on attrition. While attrition has<br />
been lowered in each case, it is not possible to pin point any<br />
specific cause. One thing we feel confident with is that this<br />
process leads to better test validity and that the objectives are<br />
being measured to the degree specified.<br />
REFERENCES<br />
Grondlund, Norman (1985). Measurement<br />
MacMillian Publishing,-New York p. 515.<br />
209<br />
. .
_-.-.<br />
Subjective and Cognitive Reactions to Atropine/2-PAM,<br />
Heat, and BDU/MOPP-IV<br />
John L. Kobrick, Richard F. Johnson, and Donna J. McMenemy<br />
US Army Research Institute of Environmental Medicine<br />
Natick, Massachusetts 01760-5007<br />
The current US armed forces nerve agent antidote is a<br />
combination of 2 mg atropine sulfate (atropine) and 600 mg<br />
pralidoxime chloride (2-PAM) administered by paired intramuscular<br />
injections. Although these drugs provide good physical<br />
protection, they have side effects which could lead to adverse<br />
subjective reactions and Impaired performance (Taylor, 1980).<br />
The major physiological reactions to atropine alone<br />
(Marzulli & Cope, 19501, and to atropine in combination with heat<br />
stress (Kolka, Stephenson, Bruttig, Cadarette, & Gonzalez, 1987)<br />
have been identified. Effects on psychological, perceptual, and<br />
cognitive behavior are less clear, although some performanceoriented<br />
studies have been reported (Baker, et al., 1983; Moylan-<br />
Jones, 1969; Penetar & Henningfield, 1986; Wetherell, 1980). The<br />
physiological effects of P-PAM alone and in combination with<br />
atropine have also been studied (Holland, Kemp, & Wetherell,<br />
19781. Much less is known about associated psychological and<br />
perceptual effects (Headley, 19821, although such knowledge is<br />
essential in view of their paired use as the standard nerve agent<br />
antidote.<br />
Chemical warfare in tropic and desert areas also creates<br />
problems due to heat stress, especially when troops must wear<br />
MOPP-IV chemical protective clothing, since the total<br />
encapsulation of that ensemble traps heat and body moisture.<br />
This paper reports subjective symptoms, mood changes, and<br />
cognitive performance observed during a research project on the<br />
effects of heat exposure, atropine/2-PAM administration, and<br />
wearing of both the BDU and MOPP-IV ensembles. The overall<br />
project consisted of two separate studies which were identical<br />
except that the BDU ensemble was worn in one of the studies, and<br />
the MOPP-IV ensemble was worn in the other study.<br />
Study 1. Effects of Atropine/2-PAM and Heat on Symptomatic, Mood,<br />
and Cognitive Reactions While Wearing the BDU Ensemble<br />
Method<br />
Fifteen male soldiers, ages 18-32 years, were screened<br />
medically and were tested for normal vision and hearing. They<br />
were trained intensively 6 hours dally for 5 consecutive days on<br />
a battery of performance tasks and then performed the task8 on 4<br />
separate test days, each day corresponding to one of the<br />
following experimental test conditions: (a) control (saline<br />
placebo, 70°F 121.1°C1 30% RH); (b) drug only (2 mg atPOPine,<br />
600 mg 2-PAM, 70°F E21:1°C3<br />
(saline placebo, 95'F [3!j"cj,<br />
30% RH); (c) ambient heat only<br />
60% RH); and (dl drug and ambient<br />
. .
-- ---. ----- --- --.._ _ .I-. I<br />
heat (2 mg atropine, 600 mg 2-PAM, 95OF c35°C1. 60% RH). On each<br />
test day, the soldiers received either atropine/2-PAM or<br />
equivalent volumes of saline placebo, injected into the thigh<br />
muscle by 22-gauge syringes. Drug conditions were double-blind;<br />
however, the study medical monitor knew the identities of both<br />
drug and placebo participants. Test days were separated by at<br />
least three days for recovery from the preceding drug conditions.<br />
Daily testing began 30 min after drug administration.<br />
Participants attempted to complete three cycles of the<br />
performance tasks each testing day, and performed until either<br />
they withdrew voluntarily or were removed by the medical monitor-.<br />
Cycles began at standard 2-hr intervals to maintain uniformity of<br />
daily heat exposure. Participants were allowed to drink water ad<br />
lib from standard military canteens; lunch and snacks were<br />
omitted.<br />
Three subjective tests were administered periodically during<br />
each experimental session: (a) the US Army Research Institute of<br />
Environmental Medicine Environmental Symptoms Questionnaire (ESQ;<br />
Kobrick a( Sampson, 1979), as modified by Kobrick, Johnson, and<br />
McMenemy (19881 ; (bl the Profile of Mood States (POMS; McNair,<br />
Lorr, & Droppelman, 19811; and (cl the Brief Subjective Rating<br />
Scale (BSRS; Johnson, 19811. The ESQ is a self-rating inventory<br />
for sampling subjective reactions and medical symptomatology<br />
during exposure to environmental and other stressors. The'POMS<br />
is a rating scale of 65 items to assess 6 mood states (tension,<br />
depression, anger, vigor, fatigue, confusion). The BSRS<br />
appraises subjective feelings of warmth, discomfort, and<br />
tiredness on separate rating scales by selection of descriptive<br />
words or phrases. The ESQ and POMS were given once at the end of<br />
each daily session. The BSRS was given once at the beginning of<br />
each session (30 min post-injection) and once at the end of each<br />
cycle (150, 270 and 390 min post-injection).<br />
Participants performed the following cognitive tasks in each<br />
2-hour testing cycle: (11 verbal reasoning - judging the<br />
correctness of grammatical transformations (Baddeley Grammatical<br />
Reasoning Test, 1968); (2) simple reaction time - pressing a key<br />
to the onset of a signal 1,lght; (3) choice reaction time -<br />
pressing one of two keys to the onset of one of two signal<br />
lights; (4) digit -symbol substitution - substituting code symbols<br />
for their symbol counterparts (Digit Symbol Substitution Test,<br />
Wechsler, 18551; (51 speech intelligibility - correctly<br />
identifying spoken words among other similar words (Modified<br />
Rhyme Test, House, et al, 19651.<br />
Results<br />
The group mean ratings for each of the 68 ESQ items in each<br />
of the four test conditions showed the fewest severe symptoms in<br />
the control condition. The two heat conditions generated more<br />
symptoms, and different patterns of incidence related to heat.<br />
Atropine/2-PAM generated high ratings on symptoms usually<br />
attributed to those drugs (dry mouth, thirst). Drug/heat, the<br />
most severe test condition, generated the greatest number of high<br />
211<br />
_
atings. Headache and lightheadedness were reported only under<br />
drug/heat.<br />
Two-way (Temperature x Drug) analyses of variance (ANOVAs)<br />
on the POMS ratings showed significant main effects for both<br />
temperature and drug, acting to Increase tension (F(l,14) =<br />
5.36,E
On the POMS, two-way ANOVAs for repeated measures on each of<br />
the scares showed significant drug and temperature main effects<br />
and significant Drug x Temperature interactions, indicating that<br />
the drug led to feelings of tension (F(1,7) = 7.06, ~
performed, to elicit early reactions prior to withdrawl. The ESQ<br />
and POMS could not be analyzed in this manner because they were<br />
.given only once at the end of each test day. Significant<br />
temperature main effects were found for warmth (F(1,7) = 37.19.<br />
B
Journal of Clinical Pharmacology, 2, 367-368.<br />
House, A. S., Williams, C. E., Hecker, M. H. L., & Kryter,<br />
K. (19651. Articulation testing methods: Consonantal<br />
differentiation with a closed response set. Journal of<br />
the Acoustical Society of America, 37, 158-166.<br />
Johnson, R. F. (19811. The effects of elevated ambient<br />
temperature and humidity on mental and psychomotor<br />
performance. In Handbook of the Thirteenth Commonwealth<br />
Defense Conference on Operational Clothing and Combat<br />
Equipment (pp. 152-1533). Kuala Lumpur, Malaysia:<br />
Government of Malaysia.<br />
Kobrick, J. L., Johnson, R. F., & McMenemy, D. J. (1988).<br />
Nerve agent antidotes and heat exposure: Summary of<br />
effects on task performance of soldiers wearing BDU and<br />
MOPP-IV clothing sy‘stems (Technical Report Tl-89).<br />
Natick, MA: US Army Research Institute of Environmental<br />
Medicine. (DTIC Accession No. A 206-2221<br />
Kobrick, J. L., Johnson, R. F., & McMenemy, D. J. (1990).<br />
Effects of nerve agent ,antidote and heat exposure on<br />
soldier performance in the BDU and MOPP-IV ensembles.<br />
<strong>Military</strong> Medicine, 155, 159-162.<br />
Kobrick, J. L., & Sampson, J. B. (1979). New inventory for<br />
the assessment of symptom occurrence and severity at<br />
high altitude. Aviation Space and Environmental<br />
Medicine, 50, 925-929.<br />
Kolka, M. A., Stephenson, L. A., Bruttig, S. P., Cadarette,<br />
43. s., & Gonzalez, R. R. (19871. Human thermoregulation<br />
after atropine and/or pralidoxime administration.<br />
Aviation S ) 58, 545-549.<br />
Marzulli, F. N, & Cope, 0. P. (19501. Subjective and<br />
objective study of healthy males injected<br />
intramuscularly with 1, 2 and 3 mg atropine sulfate<br />
(Medical Division Research Report No. 241. Aberdeen,<br />
MD: US Chemical Corps, Army Chemical Center.<br />
McNair, D. M., Lorr, M., & Droppelman, L. F. (19821. EITS<br />
manual for the Profile of Mood States. San Diego, CA:<br />
Education and Industrial <strong>Testing</strong> Service.<br />
Moylan-Jones, R. J. (19691. The effect of a large dose of<br />
atropine upon the performance of routine tasks. British<br />
Journal of Phsrmacoloffy, 37, 301-305.<br />
Penetar, D. M., & Henningfield, J. E. (1986). Psychoactivity<br />
of atropine in normal volunteers. Pharmacology and<br />
Biochemistry of Behavior, 24, 1111-1113.<br />
Taylor, P. (1980). Anticholinesterase agents. In A. G.<br />
Gllman, L. S. Goodman, & A. Gilman (Eda.), The<br />
pharmacological basis of therapeutics (6th ed., pp.lOO-<br />
,119). New York: Macmillan.<br />
Wetherell, A. (1980). Some effects of atropine on short-term<br />
memory. British Journal of Clinical Pharmacology, 10,<br />
627-628.<br />
215
GUTS : A BELGlAN GUNXER ?'ESTl?;G SYSTF2-I<br />
F. LESCREVE<br />
W. SLOWACK<br />
CRS - Belgian Armed Forces CRS - Belgian Armed Forces<br />
Brussels<br />
Brussels<br />
1. Introduction<br />
To fclfil the need of expert-selection for gunners, the Belgian Army<br />
has developed a selection-simulator. First a job analysis of different<br />
weapon systems was completed. This was the base for the construction of<br />
GUTS. Different physical and psychological stressors are important.<br />
2. Theoretical Background<br />
GUTS is constructed from a holistic point of view. We chose to put<br />
the candidate gunners in a complete, real life-like situation instead of<br />
confronting them with different subtasks from the gunners job, one at<br />
the time.<br />
3. Job Analysis<br />
Following weapon systems were carefully observed to extract the<br />
crucial taskcomponents':<br />
- Leopard-tank<br />
- CVRT (Combat Vehicle Reconaissance Trackted)<br />
- GEPARD (Anti-Aircraft tank)<br />
- HAWK (Anti-Aircraft Missile)<br />
- JPK (Jacht Panzer Kanone)<br />
- AIFV (Armored Infantry Fighting Vehicle;<br />
- MILAN (Missile Leger Antichar ><br />
a. Tasks<br />
The following tasks were common to practically all the weapon<br />
systems.<br />
1. Knowledge of procedures. _-<br />
2. Ranging and target recognition.<br />
3. Target engagement and acquisition.<br />
4. Target identification.<br />
5. Choice of ammunition.<br />
6. Loading of ammunition.<br />
7. Tracking and firing.<br />
The working space of the gunner was measured. For the construction<br />
of the simulator we took the average of the measures of the different<br />
weapon systems.<br />
216
. Stressors<br />
An anal.ysis was made of the possible physical and psychological<br />
stressors.<br />
1. Physical Stressors<br />
1. Limited working space.<br />
2. Iieat caused by instruments, engine, clothing.<br />
3. Vibration due to the vehicle movements.<br />
4. Noise, especially in a war situation.<br />
5. Darkness.<br />
2. Psychological Stressors<br />
1. Overload of information, visual and auditory.<br />
2. Permanent concentration needed.<br />
3. Time presscre.<br />
4. Unexpected events.<br />
5. Feeling of isolation;,<br />
6. Feeling of claustrofobia<br />
C. Ability and Aptitude Requirements<br />
Based on the tasks analysis and the inventory of the different<br />
stressors, it is clear that several ability and aptitude requirements<br />
the<br />
are needed for beeing a good gunner. As we shall see later,<br />
different requirements are also needed for a good performance in GUTS.<br />
Requirements :<br />
- Learning ability.<br />
- Memory.<br />
- Reaction time.<br />
- Motor coordination.<br />
- Stress management.<br />
- Concentration.<br />
4. Construction of GUTS<br />
The aim was to incorporate the different tasks and stressors in the<br />
selection-simulator. Ke did this in the following design.<br />
16<br />
I
a. The Cabine<br />
For the size of the working space we used the average of the<br />
measurements of the different weapon systems. Following sizes were taken<br />
into consideration; the depth of the working space, the space for the<br />
head movements, size of the seat, distance between look-hole and the<br />
handle, distance between head and top of the cabine, space for the legs<br />
and distance between seat and top of the cabine.<br />
b. The Instruments<br />
The instruments in the simulator are life-like copies of real<br />
instruments. We discuss here only the most important ones, with are<br />
indicated at the figure. .<br />
1. Lookhole : ynrough the lookhole you can see the screen of the<br />
computer on with you see a landscape with the<br />
different: targets. You also see the circle for the<br />
engagement of the targets and the reticle for the<br />
tracing of the targets.<br />
2. Identification box :This box is used for identifying the targets<br />
: 6 possibilities.<br />
3. Control box : With this box you start the whole procedure.<br />
4. Radio box : Here are the headphones for the candidate connected.<br />
In this way the candidate recieves his weapon<br />
control orders. There are also a lot of disturbing<br />
sounds and non important conversations on the radio.<br />
5. Ammunition box : To choose the type of ammunition depending on<br />
distance and sort of the target : 4 possibilities.<br />
6. Heating device : By means of a thermostat the temperature in the<br />
simulator raises to 30” C.<br />
11. Loudspeakers : The loudspeakers at the bottom of the cabine<br />
produce war-sounds. By making low frequency sounds<br />
they produce a disturbing vibration.<br />
15. Handle : The candidate must use the handle in order to engage<br />
a target with the circle he has on his screen,<br />
tracking a target by means of the reticle and firing<br />
with the fire buttons. The handle can move in all<br />
directions.<br />
__<br />
The candidate -gunner has to wear a gasmask. This is connected by a<br />
tube to a valve. Every five minutes the air supply is cut off for five<br />
seconds.<br />
5. The Testsession<br />
a. Learning the Testinstructions<br />
The goal of the testsession is explained to the candidate. He gets a<br />
description of all the instruments. He has to learn the different- kind<br />
Of tanks, the ammunition, the identification procedures and the<br />
engagement procedures. Special attention is payed to the weapon control<br />
orders (WCO).<br />
218
. The Test<br />
After a short demonstration of the instruments of the simulator, the<br />
candidate' puts on a gasmask and a battle dress and climbs in the cabine.<br />
The actual test consist of 3 identical cycles. Every cycle has 4<br />
periods, one period for each WCO. The test takes 30 minutes.<br />
C. The Engagement-firing Cycle<br />
To eliminate a target, a candidate must follow a strict procedure.<br />
1. Engagement of the target : steering the circle on the target<br />
and pushing the engagement buttons on the handle.<br />
2. Selection of ammunition, depending upon the kind of target and<br />
the distance of the target.<br />
3. Loading the ammunition<br />
I. Confirmation of the ammunition<br />
5. Aiming by putting the reticle on the target.<br />
6. Firing, by pressing-the firing buttons on the stearing gear.<br />
The engagement-firing cycle must be repeated for each target. The<br />
computer (APPLE MAC II) registrates all the actions of the candidate.<br />
d. Discussion<br />
A tes.tsession in GUTS is a heavy experience. This is caused by the<br />
physical and the psychological stressors. The candidates come out<br />
sweating.<br />
6. Results<br />
The candidates are scored in following categories :<br />
a. Time between appearance of a target and firing at a target.<br />
b. Decision errors : engaging the wrong target.<br />
c. Manipulation errors : for example choosing the wrong ammuilition.<br />
d. Procedure errors : errors made in the engagement-firing cycle.<br />
e. Numbers of hits.<br />
7. Validation of GUTS<br />
In 1991 a study will be carried out concerning the reliability and<br />
validity of GUTS.<br />
219<br />
.
CHAKAC'l'EKIZlNCi Kfi:jPON!~ELj TO fi’l’HE!;S UT-LLL% ING DOSP: EQUlVALENCY ME’L’1i01~0LOC,Y<br />
Robert S. Kennedy, Essex Corporation: William P. Dunlap, Tulane University:<br />
Janet J. Turnage, University of Central Florida:<br />
Jennifer E. Fowlkes, Essex Corporation*, Orlando, FL<br />
1NTRODUCTlON<br />
One of the chief problems in quantifying the effects of stressors on<br />
operational performance, such as heat and combat stress, is the lack of<br />
reliability in the criterion tasks. To circumvent the problems which hinder<br />
development of a quantitative definition of workforce performance decrement,<br />
we offer two methodologies: surrogate measurements and dose equivalency<br />
testing.<br />
Surroqate Measurement<br />
Insufficient attention to reliability can lead to attenuated validities,<br />
reduction of statistical power, higher sample size requirements, increased<br />
cost of experiments, and when hazard or discomfort is involved, human use<br />
problems. These problems focused us on development of highly reliable measure<br />
sets such as may be obtained with microcomputer-based mental acuity tests<br />
(Kennedy, Baltzley, Lane, Wilkes, & Smith, 1989; Kennedy, Wilkes, Lane, &<br />
Homick, 1985). We recognized these are separate from the operational<br />
criteria, but highly similar to the criteria in skill requirements. We<br />
reasoned that, if the measures correlate well with the criteria and behave<br />
similarly under changing task conditions, perhaps they could be used as a<br />
surrogate in place of the criteria. We called this approach “surrogate<br />
measurement” ( Lane, Kennedy, & Jones, 1986) and listed requirements for<br />
surrogate tests as those which are related to or predictive of real-world<br />
performances but are not actual measures of the performance per se. In our<br />
plan, surrogate measures are composed of tests or batteries that exhibit five<br />
characteristics:<br />
1. Stable so that the “what is being measured” is constant:<br />
2. Correlated with the operational performance;<br />
3. Sensitive to the same factors that would affect performance as the<br />
performance variable would:<br />
4. More reliable than field measures; and<br />
5. Involving minimal training time.<br />
An obvious candidate for a surrogate to measure military performance would<br />
be the Armed Services Vocational Aptitude Battery (ASVAB). ASVAB scores are<br />
used to determine eligibility for various military occupational specialties<br />
based on construct validity and continuing programs of empirical studies. The<br />
tests of the ASVAB also have considerable content and criterion-related<br />
validity, including training performances at military formal schools as well<br />
as operational performance studies. In at least one case (Wallace, 19821,<br />
performances during war games with tank forces were correlated with subtest<br />
scores from the ASVAB better than with any other variable in the study. But<br />
the ASVAH is not meant to be administered repeatedly. If it could be shown<br />
that the ASVAB was highly correlated with a repeated measurement test battery.<br />
-<br />
*Dr. Fowlkes is now employed at Engineering and Economics Research, Orlando, FL<br />
. m-,-i-. ..- _.... _ _,,<br />
220<br />
f
then by the principle of transitivity (things equal to the same thing are<br />
equal to each other), one might link changes in the surrogate with changes in<br />
the operational performance about which one wished to make statements.<br />
Dose Equivalency<br />
A second methodology that can be employed in studies of real-world<br />
performance is called “dose equivalency.” Dose Equivalency is a strategy used<br />
in conjunction with surrogate measures in order to quantify degradation of<br />
operational performance by selecting an indexing agent(s) or treatment and a<br />
set of target performance tasks. Then graded “dosages” of the indexing agent<br />
are administered and performance decrements as a function of the indexing<br />
agent are marked or calibrated against the various dosages. One is left with<br />
a functional relationship between an agent and performance(s).<br />
This strategy has been applied in a study we conducted using different<br />
dosages of orally-administe,red alcohol (Kennedy, Baltzley, Lane, Wilkes, &<br />
Smith, 1989). Alcohol was selected as the indexing agent for several<br />
reasons : 1) alcohol is known to be a global depressant, having wide-ranging<br />
and well-documented impacts on performance and operational readiness has been<br />
directed to the identification and calibration of what are to be considered<br />
“safe” and “unsafe” doses of this agent, 3) equipment and assay procedures are<br />
readily available for calibrating both blood alcohol levels (BALI and alcohol<br />
detected in expired breath (breathalyzer) and 4) because alcohol is widely<br />
used, it is feasible to administer to male subjects who, by self-report, use<br />
alcohol to a moderate degree, thereby obviating potential threat to volunteers<br />
and meeting requirements for ethical treatment of subjects in human<br />
experimental research.<br />
EXPERIMENT 1 - APTS AS SURROGATE FOR ASVAB SUBTEST<br />
In this analysis, 16 women and 11 men ranging in age from 18 to 38 were<br />
tested with a synthetic Armed Services Vocational Aptitude Battery (ASVAB)<br />
(Steinberg, 1986), and the microcomputer-based assessment used was the<br />
Automated Performance Test System (APTS), which is fully described in Kennedy<br />
et al. (1989). Seven of the tests used were from the original APTS (Pattern<br />
Comparison: Two-Finger and Nonpreferred Hand Tapping: Code Substitution:<br />
Simple Reaction Time: Grammatical Reasoning: and Manikin) and four additional<br />
subtests were selected from the Unified Tri-Service Committee Performance<br />
Assessment Battery (UTC-PAB) (Englund, Reeves, Shingledecker, Thorne, Wilson,<br />
h Hegge, 1987).<br />
The most dramatic findings were the consistently high reliabilities of the<br />
battery subtests; the smallest reliability was 0.85, which in our judgment is<br />
sufficient for statistical power and differential purposes.<br />
Scores on the performance battery were averaged across the seven trials<br />
and then correlated with the subscales and total score from the ASVAB.<br />
Multiple regression was used to examine the predictive power of the battery as<br />
a whole on the total ASVAB criterion. The multiple R was 0.94 (R2 = 0.881,<br />
and, even when corrected for shrinkage, the multiple H was 0.88. This<br />
indicates that when shrinkage owing to the particular sample used is taken<br />
into account, 7’1% of the ASVAB variance is explained by performance on the<br />
battery subtests.<br />
221<br />
.
A second mult ip1.e regression ana1ysi.s was conducted including the<br />
candidates’ surrogate performance subtests - those that would be used in the<br />
second study - Code Substitution, Pattern Compar.ison, Grammatical Reasoning,<br />
Manikin, Math PrOCeSSirlg , TWO Hand Tapping, Non-Preferred Hand Tapping, and<br />
Reaction Time. The multiple R was -92, indicating that we lost very little<br />
common variance with the ASVAB by using the shortened surrogate battery.<br />
Method<br />
EXPEK~MENT ~-INDEXING AGENT (ALCOHOL) ADMINISTEKED<br />
TO EXPERIMENTAL SAMPLE*<br />
Subjects. Male students, ranging in age from 21 to 42, were recruited as<br />
subjects. Acceptable candidates were those indicating some, but not<br />
excessive , experience with alcohol, no past history of chronic dependency of<br />
any types good general health, and indications of low risk for future<br />
alcohol-based problems. Students indicating problem family histories of<br />
chemical abuse/dependency and/or past personal histories of chemical<br />
abuse/dependency were advised not to participate. Various paper-and-pencil<br />
and computer software materials were employed in screening and assessing the<br />
individual subjects and are discussed in detail elsewhere along with &he<br />
criteria employed in Kennedy, Baltzley, Lane, Wilkes, and Smith (1989).<br />
Microcomputer testing was conducted with eight identified NEC PC8201A<br />
microprocessors, and the Intoximeter Model 3000 breath analyzer was used to<br />
estimate alcohol concentrations in the blood.<br />
Procedure<br />
Alcohol was consumed in a group setting with subjects completing the<br />
drinking between several minutes to slightly more than one hour. Each subject<br />
was brought to 0.15 BAC and monitored as the descending limb of the BAL curve<br />
was achieved.<br />
Upon completing data collection, subjects were returned to supervised<br />
housing where they were required to stay for the remainder of the evening and<br />
abstain from further consumption of alcohol. Upon wakening the following<br />
morning, subjects self-administered one battery of the APTS. This “hangover”<br />
measure was completed within one hour of wakening and all measures were<br />
finalized by 9:30 A.M. The hangover measure typically occurred within 13 to<br />
17 hours of the pre-alcohol APTS measure taken the previous day.<br />
Results<br />
The means for each of the APTS performance measures were monotonicallY<br />
related to blood alcohol levels and all were significant (p < -001). Figure 1<br />
*depicts the performance measures for a sample subtest (Code Substitution) in<br />
*Many oE the technical details regarding methods, procedures, and safeguards<br />
in studying the effects of orally administered alcohol on APTS performance<br />
wet-e worked out in previous research and are described extensively elsewhere<br />
(Kennedy. Wilkes, and Rugotzke, 1989).<br />
222<br />
i
the order they were obtained. FoLLowing the alcohoL challenge, perEormance<br />
dropped dramatically on all subtests, then recovered, in most cases in a<br />
monotonic or near monotonic Eunction, as determined by BAL during the period<br />
of alcohol metabolism. If one were to choose a single subtest to index BAL,<br />
Code Substitution would be a likely candidate. For this test it can be seen<br />
that each change of one hundredth of a percent BAL is indexed by a change of<br />
approximately 1.5 points on the code Substitution task.<br />
Figure 1. Code Substitution Number Correct for<br />
Baseline and Blood Alcohol Levels As Shown<br />
Formulation of the Quantitative Dose Equivalency Model<br />
Multiple regression was used to develop scores that maximally predicted<br />
BAL. The multiple regression between BAL as predicted from all nine surrogate<br />
battery subtests was 0.77. Subsequent stepwise regression analysis revealed<br />
that an optimally selected subset of only four oE the subtests produced a<br />
multiple R of 0.765; therefore, virtually no loss in predictive power resulted<br />
from use of the shortened battery. When this latter coefficient is corrected<br />
for shrinkage, R equalled 0.75; therefore, 57% of the variance in blood<br />
alcohol is predictable from the four subtest battery. The resulting<br />
regression equation (simplified by rounding to whole numbers) is shown in<br />
Equation (1):<br />
BAL = 0.3 - (9CS+2GR+SMP+6TFT)/lOOO, (1)<br />
where CS, GR, Mp, and TFT, refer to percent decrement from baseline of Code<br />
Substitution, Grammatical Reasoning, Math Processing, and Two-Finger Tapping,<br />
respectively.<br />
223
- --.- -. -..- .._.._<br />
To further demonstrate how the Eour test surrogate battery surfaced by the<br />
above research can serve as a bridge between alcohol (the indexing agent) and<br />
military performance readiness (the synthetic ASVAB) we computed one further<br />
regression equation from the synthetic ASVAB data described above. Equation<br />
(2) predicts the ASVAB (scaled wit.h mean = 100 and SD = 15) from Code<br />
substitution, Grammatical Reasoning, and Math Processing. Two Finger Tapping<br />
was not used as its beta weight in the equation was quite low. The equa t ion<br />
is:<br />
ASVAB = .92CS + .42MP + .15GR + 26 (2)<br />
where CS, MP, and GR are raw scores for Code Substitution, Math Processing,<br />
and Grammatical Reasoning, respectively. Using this equation to fit the data<br />
from the alcohol study, we can represent the perEormance decrements produced<br />
by the various BAL levels relative to a metric based on a standardized ASVAB<br />
as follows. These relationships are shown in Table 1.<br />
Table 1. Predicted Standardized ASVAB Means and Standard Deviations<br />
from Surrogate Battery Performance as a Function of Blood Alcohol Level<br />
BAL Mean ASVAB SD<br />
Baseline 103.6 12.3<br />
0.050 104.7 13.4<br />
0.075 101.8 12.7<br />
0.100 96.5 13.9<br />
0.125 90.3 15.9<br />
0.150 85.9 13.4<br />
CONCLUSION<br />
The objective of the effort reported herein was to provide a quantitative<br />
methodology to permit assessment of performance degradation in humans which<br />
may result Erom exposure to toxic or stressor agents encountered on the<br />
battleEield. The scientific literature has shown that performance on the<br />
ASVAB is correlated with military job performance, and tests from a<br />
microcomputer test battery have been developed which are sensitive to gases<br />
like halon and to various toxic agents. Using these relations, the present<br />
analyses were performed, the results of which are:<br />
0 Performances on APTS subtests are correlated with subtests of a synthetic<br />
ASVAB.<br />
0 Increasing dosages of alcohol result in monotonically greater performance<br />
decrements on APTS subtests.<br />
0 The performance decrements can be indexed to percent blood alcohol via a<br />
linear regression equation.<br />
0 Performance decrement on APTS can be indexed to performance decrement On<br />
aptitude tests via a linear regression equation.<br />
0 Performance equivalency and dose equivalency relationships were<br />
successfully demonstrated so that:<br />
224
a regression equation can be created which translates reductions in<br />
APTS performance due to any treatment (such as an irritant gas or<br />
psychological stress) into ASVAB equivalent PerfOtmanCeS, and<br />
a regression equation can be created which translates reductions in<br />
performance due to such agents or treatments into units 0E percent<br />
blood alcohol.<br />
ACKNOWLEDGMENTS<br />
Support for this research was from the U.S. Army' Medical Research<br />
Acquisition Activity under Contract DAMD17-89-.C-9135. The views, opinions,<br />
and/or findings contained in this report are those of the authors and should<br />
not be construed as an official Department of the Army position, policy or<br />
decision unless so designated by other documentation. The authors . are<br />
indebted to Gene G. Rugotzke for conducting the blood alcohol analysis and to<br />
Robert L. Wilkes for collection of APTS data.<br />
.p<br />
REFERENCES<br />
Englund, C. E., Reeves, D. L., Shingledecker, C. A., Thorne, D. R., Wilson,<br />
K. P., & Hegge, F. W. (1987). Unified Tri-Service Coqnitive Performance<br />
Assessment Battery (UTC-PAB): I. Desiqn and specification of the battery,<br />
Report No. 87-10. San Diego, CA: Naval Health Research Center.<br />
Kennedy, R. s., Baltzley, D. R., Lane, N. E., Wilkes, R. L. & Smith, M. G.<br />
(1989). Development of microcomputer-based mental acuity tests: Indexing<br />
to alcohol dosaqe and subsidiary problems (Final Report, Grant No. ISI-<br />
8521282). Washington, DC: National Science Foundation.<br />
Kennedy, R. S., Wilkes, R. L., Lane, N. E. & Homick, J. L. (1985). Preliminary<br />
evaluation of a microbased repeated measures testinq system,<br />
Technical Report (EOTR 85-l). Orlando: Essex Corporation.<br />
Kennedy, R. S., Wilkes, R. L., & Rugotzke, G. G. (1989). cognitive performance<br />
deficit regressed on alcohol dosage. Proceedinqs of the 11th Tnternational<br />
Conference on Alcohol, Drugs, and Traffic Safety (p. C-27).<br />
Chicago, IL.<br />
Lane, N. E-, Kennedy, R. S. & Jones, M. B. (1986). Overcoming unreliability in<br />
operational measures: The use of surrogate systems. Proceedinqs of the<br />
30th Annual Meetinq of the Human Factors Society. Santa Monica, CA: Human<br />
Factors Society.<br />
NEC Home Electronics (USA). (1983). NEC PC-8201A Users Guide. Tokyo: Nippon<br />
Electric Co., Ltd.<br />
Steinberg, E. P. (1986). Practice for the Armed Services test. New York: Acco<br />
Publishing Co.<br />
Wallace, J. R. (1982). The Gideon criterion: The effects of selection criteria<br />
on soldier capabilities and battle results, Research Memorandum 82-l.<br />
Fort Sheridan: U.S. Army Recruiting Command, Research, Studies and<br />
Evaluation Division, Program Analysis and Evaluation Directorate. (NTIS<br />
No. AD Al27 975)<br />
225
Job Sets for Efficiency in Recruiting and Training (JSERT)’<br />
Jane M. Arabian and Amy C. Schwartz’<br />
U.S. Army Research Institute<br />
for the Behavioral and Social Sciences<br />
Alexandria, VA<br />
The Army is facing radical changes brought about by the reduction in the siz:.of its<br />
force. The challenges encountered by the Army will require different and more clllcicnl<br />
ways of going about the business of recruiting, selecting and classifying young nlcll :md<br />
women as they enter the service. Changes in enlisted end strength will have a dyn:tmic<br />
impact on, for example, MOS fill and training seat utilization. In the past, changes in<br />
authorizations have caused a manpower surplus or shortage in various MOS. The dcla~cd<br />
entry program (DEP) has not been able to provide enough flexibility to compens:ltc fog<br />
such near term authorization changes. Therefore, the Army has begun to evalu:ltc the<br />
potential for a “job sets” concept to improve manpower and personnel managemctll I)!(<br />
fostering more timely, accurate personnel classification.<br />
This paper will describe the rationale and tailoring of the JSERT concept to the<br />
particulars of the Army’s current manpower and personnel environment. The gcneml<br />
approach was to devise two parallel tracks: 1) the pragmatic identification of occup:l(ions<br />
(MOS) which would comprise a given “Job Set” and 2) an empirical research progr:lm i’or<br />
confirming the “Job Sets”, devising a means for selecting an appropriate classification tcs!<br />
battery, and developing a feedback/appraisal system for the JSERT concept.<br />
Currently, in the vast majority of cases each recruit receives a contract for a spccii’ic<br />
occupation, such as M-l turret mechanic. This contract is a legal commitment by the Army<br />
to provide the individual with the specific training for M-l turret mechanics. This mc:lns<br />
that if the Army finds that it doesn’t need as many M-l turret mechanics it had cstim:\l4<br />
or that it needs more Bradley turret mechanics than expected, contracts must he rcnegotiated<br />
and the individuals involved may decide not to enlist. This may be costly, both<br />
in terms of dollars and loss of desirable individuals for service.<br />
The Army has been able to accommodate small discrepancies in its estimates for<br />
personnel by tapping into the pool of recruits in the Delayed Entry Program (DEl’).<br />
However, this does not always provide a satisfactory solution; individuals’ contracls still<br />
need to be honored. Given the anticipated changes in the size of the force and its<br />
composition, it is expected that it will become even more difficult to estimate accurately<br />
’ Paper presented at the 32nd Annual Conference of the <strong>Military</strong> <strong>Testing</strong><br />
<strong>Association</strong>, November 1990, Orange Beach, ALA.<br />
’ All statements expressed in this paper are those of the authors and do not<br />
necessarily express the official opinions or policies of the U.S. Army Research Institute<br />
or the Department of the Army.<br />
226
--__-. -. _<br />
the Army’s near-term personnel needs and to make up for the differences with the DEP.<br />
More flexibility in manpower management and personnel assignment is needed. The<br />
development of “Job Sets”, as described below, would give the Army such flexibility.<br />
Grouping Jobs<br />
Many MOS have the same or very similar entry requirements (i.e., Armed Services<br />
Vocational Aptitude Battery (ASVAB) Aptitude Area [AA] composite score cut-offs and<br />
physical [e.g., vision] profiles). This is especially true of MOS in the same CMF or Career<br />
Management Field (such as Mechanical Maintenance). It would therefore seem possible<br />
to group such MOS into “sets” for recruiting and enlistment purposes. The Army would<br />
then be able to enlist soldiers as turret mechanics, for example, without specifying, at the<br />
time of enlistment, which type of turret mechanic training they would receive. This would<br />
give the Army just that much more manpower management flexibility. Much closer to the<br />
point of actually filling training seats, the Army would be able to determine which<br />
individual would receive which specific course of training.<br />
As with any change to an established system, implementation of this JSERT concept<br />
will cause disruptions and periods of awkward adjustment. However, the concept does fit<br />
well with other current Army cost-saving initiatives (e.g., consolidating MOS and reducing<br />
the number of reception battalions) and appears to offer important benefits. This is not<br />
to trivialize the adjustments that will need to be made by, for the example, the recruiting<br />
and especially the training communities. Therefore, several steps have been taken to<br />
minimize the potential down-side of JSERT-related changes. These measures are described<br />
below.<br />
Identification and Coordination With Key Players<br />
Working closely with the Army’s Manpower and Personnel Management/Enlisted<br />
Accessions Policy office, key agencies and functions that would be affected by JSERT were<br />
identified. A “strawman” concept was circulated, briefed and discussed with each key player<br />
over a four-month period. The concept was refined and modifications were made based<br />
on the inputs received.. One key refinement was the addition of parallel tracks, one for<br />
testbed implementation and one for research and development. The tracking will be<br />
described shortly.<br />
In addition to exploring the concept with Army personnel, the Air Force classification<br />
system was also examined. While many of the manpower and personnel issues faced by the<br />
two services are quite different, the JSERT concept is not drastically different from the<br />
Air Force’s current classification system.<br />
After these information gathering and coordination efforts, a key players research and<br />
planning meeting was held. This provided the opportunity for further explication of the<br />
JSERT concept, exchange of concerns, identification of roles and responsibilities, selection<br />
of candidate job sets for a testbed, and the joint determination of milestones for the<br />
implementation of the testbed.<br />
227
Parallel Tracks<br />
-...----<br />
Given the general desire for a swift remedy to the manpower management difficulties,<br />
the development and execution of a comprehensive R & D program to address the issues<br />
raised by changing the Army’s recruiting, enlisting, and training systems was simply not<br />
feasible. Therefore, two tracks have been devised for the JSERT concept.<br />
Track One. The key feature of this track is that it is driven by practical considcrtltiorls.<br />
In order to put the JSERT concept into practice as quickly as possible, jobs can be formed<br />
into sets based on “face validity”. A primary concern is to minimize disruption to the CNF<br />
structure, and to take into account logistic, training and cost considerations. Therefvre, at _<br />
least the initial job groupings would involve MOS that currently use the sanie aptitude are:\<br />
composites and cut scores, have the same proponents and are trained in the same loution.<br />
The candidate MOS identified at the key players meeting represent a key milestone of<br />
the Track One effort. The candidate MOS have been circulated among the appropriate<br />
proponents for review and comment. Their input will be used to make the final<br />
determination of job sets for the testbed implementation.<br />
Track Two. While Track One is getting under way, the Track Two efforts have begun.<br />
Track Two efforts form an empirical, applied research program characterized by three<br />
primary features: 1) Within job set validity confirmation, 2) Classification battery<br />
selection, and 3) System appraisal/feedback.<br />
The job set validity confirmation consists of developing analytic tools or models to<br />
determine the fit of jobs with any given set. For example, attribute (skill/ability) or job task<br />
taxonomies can be used by individuals familiar with the jobs to provide “job profiles” ( e.g.,<br />
identification of the tasks making up a job and their importance or criticality). These<br />
profiles can then be compared across jobs and a judgement made as to the acceptability of<br />
the similarities or dissimilarities. If the profiles of the jobs appear too dissimilar or if one<br />
job stands out as too different from the other jobs then there would be a basis for<br />
eliminating some job(s) from the set.<br />
The job descriptions or profiles described above can also be used to help the Army<br />
identify additional classification tests. The elements of the profiles can be linked to tests<br />
of individual abilities (i.e., predictor tests). The tests may then be used to help place (or<br />
classify) individuals into jobs where they are most likely to perform well.<br />
In fact, as part of Project A, the Army’s comprehensive project to improve the selection<br />
and classification system, a new battery of predictor tests was developed. So now, in<br />
addition to ASVAB which measures primarily cognitive ability, the Army has the<br />
opportunity to assess an individual’s spatial and psychomotor abilities, temperament and<br />
vocational interests. The additional information provided by these tests can help the Arm><br />
make better use of its human resources by improving the match between a soldier’s abili tics<br />
and the job’s requirements.<br />
For this part of the JSERT program, research will be conducted to develop a<br />
methodology for building tailored classification batteries. These special batteries would<br />
be used for assigning soldiers within a job set to a particular job. The first requirement,<br />
228
however, would be to determine whether or not special classification testing is warranted.<br />
Since job performance can frequently be improved by selecting individuals with particular<br />
skills and/or by training particular skills, the trade-offs implied by testing for selection<br />
versus training would have to considered.<br />
Two research efforts are currently planned to develop the tailored classification<br />
batteries. The first effort will expand upon the Cognitive Requirements Model (CRM)<br />
developed by Hay Systems, Inc. to include spatial and psychomotor elements. This new<br />
model, CRM+, will employ decision flow diagrams to guide job experts through the<br />
elements of the model. Attributes identified in this manner will be compared across jobs<br />
to determine the similarity and differences in job requirements. The same information will<br />
also be used to identify classification tests most likely to improve the person-job match<br />
within the job set.<br />
The second effort to develop tailored test batteries will build upon the research<br />
conducted in the Army’s Synthetic Validation Project (SynVal) by the American Institutes<br />
for Research with Personnel Decisions Research Inc. and the Human Resources Research<br />
Organization. Subject matter experts will be asked either to identify directly the importance<br />
and level of attributes for jobs or to identify job tasks from the Army Task Taxonomy.<br />
Either visual inspection of the resulting profiles or more formal clustering algorithms will<br />
then be used to compare the profiles. The profile elements may then be matched up with<br />
the predictor tests. The identified tasks can be linked, using the results of SynVal, to<br />
attributes and to the tests which measure those attributes (or the directly identified<br />
attributes may themselves be used to identify the appropriate predictor tests).<br />
An important consideration of both these efforts will be to develop valid procedures<br />
that are credible and can be employed by non-scientists. Indeed, it is expected that any<br />
additional testing that may be adopted by the Army will be administered, scored and used<br />
in the assignment decision-making process by Army personnel during basic training, prior<br />
to the start of Advanced Individual Training. Therefore, the procedures must be<br />
straightforward, non-technical, and cost-efficient. A demonstrable value for administering<br />
any additional tests (i.e, improved job performance, reduced training time, lower attrition)<br />
to off-set the costs and inconvenience of specialized testing must be clearly evidenced.<br />
The third JSERT research focus is on the development of an appraisal feedback system<br />
for the JSERT “system” itself. The goal of the feedback system is to monitor the<br />
performance of JSERT, not the job performance of individuals per se. Although ratings<br />
of performance or training needs may be solicited from supervisors and individual soldiers,<br />
the ratings would be used for research purposes or for operational changes to the JSERT<br />
system, not to affect the careers of the rated individuals. The concern is to set up a<br />
monitoring system so that if jobs change over time or there is a shift in the overall abilities<br />
of soldiers being enlisted, the Army would have some consistent means of evaluating the<br />
change, documenting the impact on job performance and notifying the system that some<br />
action is needed.<br />
It may be, for example, that initial job analyses did not include some ability which,<br />
alihough not currently measured by the Army, turns out to be important for job<br />
performance. The Army may wish to specifically select individuals with this ability, but<br />
presently there is no mechanism in place that would even uncover the requirement for that<br />
229
ability, [The closest “system” the Army has for modifying the classification requirements is<br />
to notice there is some problem, such as high attrition, and then request technical advice<br />
from AR1 to identify the problem and suggest solutions.]<br />
The JSERT feedback system would provide a more formal, standardized mechanism.<br />
The feedback system must be proven to be scientifically valid and reliable for not only<br />
ensuring that the classification system is working satisfactorily, but also to provide a<br />
framework for intervention. Feedback results indicating a gap between soldier abilities, for<br />
example, and job requirements may indicate that a modification of the training curriculum<br />
is needed and/or that the classification test battery should be altered. The Army will have<br />
the results of the feedback, in addition to any cost-benefit analyses, upon which to base its<br />
correction strategy. Basically, the feedback system will create a means for getting<br />
information about how well the classification battery is working from the field (end-users)<br />
back to the classifiers. .,<br />
Conclusions<br />
While there will be disruptions to the recruiting, selection/classification and training<br />
systems, changes in roles and responsibilities (e.g., a shift in responsibility from recruiting<br />
to training commands for managing MOS fill), and modifications to computer programs<br />
(ATTRS, REQUEST), the potential benefits of the JSERT concept are considerable. The<br />
concept will: increase the opportunity for MOS consolidation and CMF restructuring,<br />
reduce MOS codes for recruiting, fit well with efforts to consolidate Basic Training sites,<br />
increase the potential for improved soldier-job matches (classification), and increase much<br />
needed manpower management flexibility.<br />
The project has the support of the Office of the Deputy Chief of Staff for Personnel<br />
and the selection of MOS for three potential testbeds (Infantry, Quartermaster, and<br />
Ordnance) is currently being finalized. Although a target start date for the testbed has<br />
been selected (July 1991), it is not clear how the testbed will proceed. The downsizing of<br />
the Army together with high recruiting levels means that approximately 30% of the FY91<br />
accession mission is already in the DEP with contracts for specific MOS training. It is<br />
conceivable that if this pattern keeps up, it will be very difficult to change over fairlv<br />
smoothly to the more general enlistment contracts needed to implement the JSERT<br />
concept. Nevertheless, the portions of the project that can proceed (the research elements)<br />
are underway.<br />
.
Development of a New Language Aptitude Eattery<br />
The Defense Language Institute Foreign Language Center (DLIFLC) is the<br />
proponent for the current Defense Language Aptitude Battery (DLAB) and is<br />
also the primary agency with the mission of providing language training for<br />
DOD military personnel. DLAB currently exists in one form. The range of<br />
correlations between DLAB and post-training measures of language<br />
proficiency across different language courses and skill modalities is from<br />
.25 to .55. DLIFLC is seeking to develop an improved aptitude test that<br />
would predict the degree to which a potential student will develop language<br />
proficiency in speaking, reading, and listening skills, and also determine<br />
the language or languages to which a potential student is best suited.<br />
This development effort builds upon an extensive database gathered on<br />
DLIPLC students in a a major ongoing project to identify predictors of<br />
success in language training and facto:cs associated with the presence,<br />
direction, and extent of language skill change after training.<br />
B-ground<br />
__.. - u<br />
At initial screening, candidates for language training must attain a<br />
minimum score on a specified composite of the Armed Services Vocational<br />
Aptitude Battery (ASVAB) in order to be elgibile to take the DLAB. There<br />
is some variation in the definition of required ASVAB composites across the<br />
Services, and certain variations in composite cut scores contingent on<br />
eventual training assignment.<br />
Approximately forty different language courses are taught either at<br />
the DLIFLC Monterey campus or through contract arrangements at other<br />
training locat ions. The length of basic foreign language courses varies<br />
from 24 to 47 weeks.<br />
DLI concentrates on general foreign language skill training with only<br />
relatively modest specialized training oriented toward specific job<br />
applications. After graduating from DLIFLC and prior to job assignment,<br />
military linguists typically receive advanced individual training (AIT)<br />
building on prerequisite basic language skills. Linguists perform a<br />
variety of sensitive jobs in signal intelligence, human intelligence, and<br />
in a liason capability with foreign governments and military forces.<br />
DLIFLC maintains significant contacts with other government and<br />
non-governemnt language training schools and universities. These contacts<br />
have been helpful to DLI in developing instructional systems and measures<br />
of training success that are relatively general in nature, while allowing<br />
more specialized training to benefit from the generally high positive<br />
transfer from basic language skills to more specialized training.<br />
Previous Research<br />
- - - - - - -~ -<br />
Since 1985, DLIFLC has actively participated in a joint research effort<br />
under the sponsorship of the U.S. Army Intelligence Center and School<br />
(USAICS), with support from the Army Research Institute for the Behaviorial<br />
and Social Sciences (ARI). This project known as the Language Skill Change<br />
Project (LSCP) investigated the following factors:<br />
1. Optimal predictors of success in language training available at<br />
initial screening prior to assignment of language training.<br />
2. Predictors of training success available during training.<br />
3. Variables related to change in language skills after DLI language<br />
training.<br />
The research design involved the collection of an extensive data base<br />
231
on 1903 Army linguists in four 1inguist military occupational special.ities<br />
(~0s) who had received language training in Spanish, German, Korean, and<br />
Russian at DLIFLC. Data collect ed at several points in the career cycle of<br />
these linguists included the following elements:<br />
1. ASVAB and DLAB scores at time of selection.<br />
2. Personality measures, interest inventories, and supplementary<br />
cognitive measures collected prior to the beginning of language training.<br />
3. Measures of the extent and nature of motivation to learn foreign<br />
languages collected prior to and during language training.<br />
4. Inventories of student learning strategies collected at two<br />
different times during their language courses.<br />
5. The Defense Language Proficiency Test (DLPT) , a series of measures<br />
of foreign language proficiency in listening, reading, and speaking skills<br />
collected immediately after graduation from language training, after<br />
subsequent AIT , and at subsequent annual administrations as mandated by<br />
Army regulations.<br />
DLIFLC and AR1 coordinated with the Office of Personnel Management<br />
(OPM) to obtain contractor support to build, collect, manage, and analyze<br />
the LSCP data base. In order to build upon information derived from the<br />
LSCP analyses, DLIFLC requested the contractor, Advanced Technologies<br />
Incorporated (ATI) to submit a plan for the design and development of a<br />
revised DOD language aptitude battery. The remainder of this paper draws<br />
heavily from that plan.<br />
Proposed e f_I_. fdevelopment o r t s<br />
The following conclusions drawn from the LSCP data analysis were<br />
relevant to the design of the new battery:<br />
1. Substantial prediction of success in language training, as measured<br />
by the DLPT, was afforded by factors not presently considered in language<br />
selection.<br />
2. The relationship of predictors to criterion performance differed<br />
across languages represented in the study, and within individual languages<br />
across the criterion skills of listening, reading, and speaking.<br />
Consequently, the AT1 management plan recommended two approaches for<br />
improving linguist selection and subsequent military linguist performance:<br />
1. to expand the range of factors considered in predicting success<br />
beyond those presently reflected in ASVAB composites and the current DLAB.<br />
2. to attempt to tailor predicton by language and language skill.<br />
From the very beginning, certain constraints on the development of a<br />
new language aptitude instrument were recognized.<br />
First of all, although the new aptitude test will attempt,\to provide a<br />
more exhaustive assessment of the potential military linguist s<br />
capabilities, the large-scale nature of its use, the time and resources<br />
that are likely to be available for its administration, and concerns about<br />
fatigue on the part of those taking the instrument necessitate that the<br />
time allotted for the the new test not greatly exceed that of the current<br />
DLAB. A possible mechanism for achieving maximal efficiency in measurement<br />
would be to use adaptive testing techniques; however, careful consideration<br />
would have to be given to the nature and interrelation of the traits<br />
underlying the abilities to be measured (as yet underspecified), and to the<br />
hardware requirements of such a system and their implications for test<br />
administration.<br />
Secondly, it would not be desirable to test different abilities or use<br />
different prediction and scoring formulae for every one of the forty<br />
languages taught in the Defense Foreign Language Program (DFLP). It would<br />
232
e preferable to group languages into a small number of categories sharing<br />
similar ability requirements and prediction characteristics.<br />
Strategy for Development Effort<br />
-_ -_.. .-. - - --- - -<br />
The sequence of activities proposed in designing the new aptitude<br />
battery is depicted graphically in Figure 1.<br />
- - - - -<br />
Define L<br />
MeasUremerit<br />
Options<br />
-..__.<br />
A<br />
ACTWlY 4<br />
ldentlfy Language<br />
Ability Requirements<br />
Figure 1<br />
Produce DUB II<br />
ACTIVITY 6<br />
-’ GroupLanguages<br />
By Ability<br />
Requhemnts<br />
The first activity listed is to define the components of the criterion<br />
to be predicted--foreign language proficiency. A first cut might be the<br />
the traditionally identified skill modalities of listening, seading,<br />
speaking, and writing; these traditional categories will need to further<br />
analyzed into much more specific task components.<br />
The next three activities planned are to identify how languages differ<br />
on proficiency dimensions, to identify abilities required to develop each<br />
type of proficiency, and to identify language differences in ability<br />
requirements. Note that the arrows in Figure 1 do not all point in a<br />
forward direction toward higher numbered activiites. As explained below,<br />
these activities are expected to be interactive and iterative processes and<br />
to involve the synthesis of several types of information into a final<br />
product based on consensus of project team members.<br />
The contractor and DLIFLC decided that the accomplishment of Activities<br />
1 through 6 would be best facilitated by an interdisciplinary approach<br />
involving a DLI expert in language proficiency testing, a cognitive<br />
psychologist specializing in the area of foreign language learning, and<br />
foreign language curriculum specialists with expertise in a wide variety of<br />
foreign languages. The intent was to combine insights from the traditional<br />
233<br />
.
perspective of linguistic analysis with a cognitively oriented analysis of<br />
the language learning process. The interdisciplinary team is expected to<br />
to develop a comprehensive list of abilities involved in learning foreign<br />
language S , including abilities that may be required in some languages but<br />
not all. On the one hand, this involves insuring that the definition of<br />
langugage proficiency is detailed enough that the cognitive abilities<br />
required to develop each category of proficiency can be specified. On the<br />
other hand, it involves reaching a consensus that the list of abilities is<br />
broadly applicable across foreign languages and across diverse training<br />
programs in these foreign languages.<br />
It is anticipated that the team would draft a preliminary taxonomy of<br />
abilities as a baseline list of skills which would be iteratively modified<br />
and improved as attempts were made to apply it to successive individual<br />
languages. At the same time the team would highlight any features of each<br />
successive language that experience had shown were relatively hard or<br />
relatively easy for native English speakers to learn. This process would<br />
serve both to validate the taxonomy and to identify differences between<br />
languages in cognitive abilities required.<br />
In an effort parallel to the validation of
Implementation of Content Validity Ratings<br />
in Air Force Promotion Test Construction<br />
Carlene M. Perry<br />
United States Air Force Academy<br />
John E. Williams<br />
Paul P. Stanley II<br />
USAF Occupational Measurement Squadron<br />
The USAFOMS has implemented a procedure in which subject-matter experts<br />
(SMEs) rate the content validity of individual test questions on Air Force<br />
promotion tests. This paper describes that procedure and assesses its impact<br />
upon test content and the perceptions of those involved in test,developm.ent.<br />
Specialty Knowledge Tests (SKTs) are loo-item multiple-choice tests which<br />
measure airmen's knowledge in their assigned Air Force specialties. Promotion<br />
to the ranks of staff sergeant (E-5) through master sergeant (E-7) is<br />
determined solely by relative ranking under the Weighted Airman Promotion<br />
System (WAPS). Each airman receives a single WAPS score, the sum of six com-<br />
ponent measures, with the SKT accounting for UP to 22% of the total. Since<br />
the other components generally yield little variance among individuals, the<br />
SKT is often the deciding factor in determining who gets promoted.<br />
SKTs are written by teams of senior NCOs with the assistance of USAFOMS psychologists.<br />
They are constructed using the content validity strategy of validation.<br />
The following components ensure a direct and close relationship<br />
between test items and important facets of the specialty being tested: 1)<br />
the specialty training standard, an Air Force document which identifies the<br />
primary duties and responsibilities in a specialty, 2) CODAP-based occupational<br />
analysis data collected by USAFOMS, and 3) the expertise of the SMEs<br />
themselves.<br />
Content validity is thoroughly documented for the more than 400 SKTs in the<br />
USAF inventory. However, the documentation is predominantly qualitative<br />
rather than quantitative in nature, as is the norm with tests based on this<br />
strategy. Test developers at USAFOMS felt that a quantitative means of assessing<br />
content validity would be useful to improve their feedback to test<br />
writers and to make it possible to study test resulfs longitudinally to help<br />
identify problem tests.<br />
Lawshe (19751 developed just such an approach, the first to focus on content<br />
validity in a quantitative, rather than a qualitative manner. His method<br />
called for a panel of subject-matter experts (SMEs) to independently rate<br />
test items using the following scale:<br />
Is the skill (or knowledge) measured by this test question:<br />
Essential (21,<br />
Useful but not essential Cl>, or<br />
- Not necessary (01,<br />
for successful performance on the job?<br />
He then used a test of statistical significance with the content validation<br />
panel’s ratings as a basis for eliminating items from a test item pool.<br />
Lawshe’s procedure has been applied by a number of investigators in a variety<br />
of situations. Distefano, Pryer, and Craig (1980) used his content valida-<br />
235<br />
.
-<br />
tion procedure to assess a job knowledge test being used as a criterion measurement<br />
of psychiatric aide training success. They stated, "It is evident<br />
that the content validity could be improved in subsequent revisions if the<br />
method is used as part of the test construction process.”<br />
Kesselman and Lopez (1979) developed an accounting job knowledge test using<br />
the procedure which they found to be superior to a commercially available<br />
mental ability instrument in predicting two criteria; supervisor assessment<br />
of subordinate job knowledge and supervisor assessment of overall job perfor-<br />
mance.<br />
Distefano, Pryer, and Erffmeyer (1983) showed that a variation of the Lawshe<br />
method could be used in the development of a behavioral rating scale of job<br />
performance, while providing quantitative content validity evidence of, the<br />
criterion scale.<br />
Finally, Carrier, Dalessio, and Brown (1990) used Lawshe’s three-point scale<br />
to focus on the correspondence between inferences made using content validity<br />
and criterion-related validity strategies. They found that for experienced<br />
candidates, job experts seemed to be able to identify those items on an interview<br />
guide that predicted the commissions of personnel into the Life Insurance<br />
Marketing and Research <strong>Association</strong>. They also noted that ". . .using<br />
content validity as the sole evidence for test validity should be limited to<br />
situations where test developers are working with well-defined constructs,<br />
such as acquired skills or specific knowledge.”<br />
Method<br />
Two content validity rating (CVR) forms were developed using a Lawshe-type<br />
scale, one form fo.r development of the SKT taken for promotion to E-5 and one<br />
for the development of the SKT taken for promotion to E-6 and E-7. The<br />
USAFOMS forms incorporated minor adjustments to the Lawshe approach. In par-<br />
titular, it was necessary to reference the grade level of the test, since<br />
knowledges required for successful performance of the E-5 duties may be considerably<br />
different from those required to perform E-6 and E-7 duties. In<br />
addition, the USAFOMS forms focused on successful performance in the specialty,<br />
not just in the job, a much broader view, since a specialty encompasses a<br />
family of related jobs.<br />
Whereas Lawshe used a test of statistical significance with the panel's ratings,<br />
this was not practical at USAFOMS because of the small number of individual<br />
raters being used. Only two to six SMEs normally participate in a<br />
test development project. To require statistical significance with such a<br />
small sample would require unanimous agreement of item essentiality. Rather<br />
than impose strict statistical criteria with the new ratings, the USAFOMS<br />
policy was stated as follows: YVRs will be used to encourage SMES to focus<br />
first on the appropriateness of test item content as it relates to successful<br />
performance in the specialty." SMEs were not required to take special actions<br />
as a result of the various ratings. In essence, SMEs could retain (ei-<br />
ther reuse on the next revision of the test or designate as an alternate) or<br />
deactivate (designate as unacceptable for reuse) an item without regard to<br />
their ratings on the CVR forms. There are, however, other requirements such<br />
as clear reference support and acceptable item statistics that must be met if<br />
an item is to be retained.<br />
This research examines how Lawshe's procedure, with the noted modifications)<br />
was employed at USAFOMS -- an organization whose promotion tests impact most<br />
236<br />
.
Air Force enlisted specialties. The first objective was to determine the<br />
extent to which SME ratings on the CVR forms impact subsequent identification<br />
of items as acceptable or unacceptable for reuse. The second is to determine<br />
how SMEs and project psychologists perceive the value and usefulness of the<br />
forms. Ninety-four SMEs, representing 25 AFSs, assigned to USAFOMS for SKT<br />
rewrite duties were asked to rate test items from their respective E-5 and<br />
E-6/7 grade-level SKTs using the CVR forms. USAFOMS test development proce-<br />
dures requires the completion of this step prior to the SMEs’ designation of<br />
an item as either acceptable for continued use on subsequent SKTs or as unac-<br />
ceptable for reuse. Once again, these test items were rated using Lawshe’s<br />
3-point scale. A rating of 2 was given to items whose content was essential.<br />
A rating of 1 was assigned to those items whose content was useful, but not<br />
essential, and a rating of 0 was assigned to those items whose content was<br />
not necessary for successful performance in the AFS. In all, 19,700 ra~tings .<br />
were obtained from 94 raters for 2 SKT levels (E-5 and E-617).<br />
Results<br />
Intraclass correlations for each of the 25 AFSs were computed to determine<br />
the interrater reliabilities for the group of SMEs from each specialty. All<br />
but two of the calculated values had p < .05. (The higher reliability values<br />
obtained seemed to be associated with the more technologically specialized<br />
fields where there is little room for variance of procedures across the<br />
Air Force, thus leading to more agreement among SMEs on items which test essential<br />
knowledge. Lower values seemed to be associated with broader specialties<br />
where there is more variance in day-to-day jobs performed and hence,<br />
less agreement and lower values of reliability.)<br />
The average CVR for items chosen as acceptable and for those designated for<br />
deactivation were also calculated for each test project. The mean CVR value<br />
for all deactivated items was 1.28 and the mean CVR value for all acceptable<br />
items was 1.43. These results conformed to our expectations that on the<br />
whole, items selected for deactivation would have lower content validity ratings<br />
than those chosen as acceptable. The average CVR value for all 19,700<br />
ratings obtained was 1.40. This average reflects the fact that on the whole,<br />
Air Force SKTs are viewed as being relatively high in content validity.<br />
To determine the actual impact, if any, of the content validity ratings on<br />
the subsequent identification of an item for reuse, a chi-square test of statistical<br />
significance was computed. The null hypothesis (H 1 for this test<br />
states that there is no difference between the proportion o? items selected<br />
as acceptable and unacceptable in each rating category. The alternative hypothesis<br />
(H 1 is that the distribution of items in each rating category differs<br />
from t%e hypothesized one. The results, as shown in Table 1, indicate<br />
significant differences between expected and observed values for acceptable<br />
and deactivated items in each rating category, with the largest differences<br />
occurring between ratings of 2 and 0. As shown, 203 more ratings of 0 were<br />
observed for deactivated items than was expected, while 234 more ratings of 2<br />
were observed for acceptable items than was expected. A chi-square value of<br />
199.7 (df=2) was obtained, indicating significance at the .Ol level. On<br />
this basis, the null hypothesis was rejected, indicating a disproportional<br />
representation of items selected as acceptable and unacceptable in each rating<br />
category. This shows that item content validity did impact subsequent<br />
identification of item acceptability.<br />
A point-biserial correlation coefficient relating identification of an item<br />
as either acceptable or deactivated with the item's average content validity<br />
237
ating was also computed. The resulting correlation coefficient, .0894, is<br />
significant at the .Ol level, yet is rather low. This can be attributed to<br />
the fact that of the 19,700 total ratings, 8,387 were ratings of 1. Since<br />
the largest differences in proportional representations were found to be in<br />
the rating categories of 0 and 2 and the number of ratings of 1 were within<br />
31 ratings of the expected value, it appears that the correlation coefficient<br />
may have been decreased by the large number of ratings of 1.<br />
Table 1. Comparison of Acceptable and Deactivated Test Items<br />
OBSERVED<br />
Acceptable Deactivated<br />
Rating 0 1213 515 1728<br />
Rating 1 6840 1547 8387<br />
Rating 2 8086 1499 9585<br />
16139 3561 19700<br />
-.-<br />
EXPECTED<br />
Acceptable Deactivated<br />
Rating 0 1415.6 312.4 1728<br />
Rating 1 6871 1516 8387<br />
Rating 2 7852.4 1732.6 9585<br />
16139 3561 17700<br />
X2 = 199.7 df = 2 PC.01<br />
Through the analysis of the data, a number of unusual cases surfaced. The<br />
data contained in these cases was contrary to expectations and evoked further<br />
analysis . For instance, 10 of the 5,200 test items evaluated were given rat-<br />
ings of 0 (not necessary) by all SMEs, yet still retained as acceptable<br />
items. Perhaps the SMEs reconsidered the item content and found it essential<br />
to successful job performance. Furthermore, 117 of the 940 deactivated items<br />
were given ratings of 2 (essential) by all SMEs. After obtaining the reasons<br />
for deactivation, these items were broken down into six categories:<br />
REASON FOR DEACTIVATION # ITEMS % OF TOTAL<br />
Poor Statistics/No Acceptable Revision<br />
Inadequate Reference<br />
Obsolete I tern<br />
Two or More (or No) Correct Answers<br />
Low Content Validity<br />
Inadvertent Duplication of Item<br />
57 49%<br />
37 32%<br />
11 9%<br />
8 7%<br />
3 3%<br />
1 1 .!<br />
As stated earlier, there are other requirements not directly related to content<br />
validity that must be met if an item is to be retained. These include<br />
238
clear reference support and acceptable item statistics, thus most of the 117<br />
items were deactivated for valid reasons. It was surprising, however, that<br />
3% of the items in question had "low content validity" cited on the i tern<br />
record card as the reason for deactivation when originally all SMEs had felt<br />
the item content was essential for successful job performance. These results<br />
could be due to administrative error, SME reconsideration of the item con-<br />
tent, or a number of other possible reasons.<br />
The second objective of this research was to determine how SMEs and project<br />
psychologists perceived the value and usefulness of the CVR forms. A three<br />
question survey was administered to 21 project psychologists at USAFOMS. A<br />
similar survey was administered to 151 SMEs upon completion of the rating<br />
forms and selection of items for deactivation. A four-point rating scale was<br />
used for the responses: strongly agree, agree, disagree, and strongly. dis-<br />
agree. The questions and summary of responses are as follows:<br />
(11 When selecting previously used items to be reused, the Content Validity<br />
Rating forms hglped identify those items most essential to successful<br />
performance in the specialty.<br />
49% (74) SMEs answered positively (agree or strongly agree)<br />
52% (111 project psychologists answered positively<br />
(2) The Content Validity Rating forms helped bring out different points<br />
of view for discussion.<br />
56% (85) SMEs answered positively<br />
76% (16) project psychologists answered positively<br />
(3) The Content Validity Rating forms were a valuable tool in selecting<br />
Discussion<br />
items to be reused.<br />
39% (59) SMEs answered positively<br />
43% (9) project psychologists answered positively<br />
Through the analysis previously described, it became apparent that there was<br />
a significant impact of content validity ratings on subsequent identification<br />
of an item as acceptable or deactivated. For instance, with the 25 projects<br />
sampled, there were 52 individual SKTs examined. Of these 52 SKTs, 37 had<br />
higher average CVR values for acceptable items than the average CVR values of<br />
the deactivated items which is what would be expected -- a positive differ-<br />
ence between the two. It is also important to note that 6 of the 52 SKTs are<br />
not applicable in this analysis since in these cases all 100 items were designated<br />
as acceptable and thus, there was no average CVR value for deactivated<br />
items. The 9 remaining SKTs had higher average CVR values for the deactivated<br />
items than the average CVR values for the acceptable items. Of these 9<br />
SKTs, 2 were from projects with insignificant interrater reliabilities. Even<br />
though the expected effect did not hold true for every case, overall, items<br />
higher in content’ validity had a greater chance of being acceptable while<br />
items lower in content validity were more likely to be deactivated. This<br />
shows that on the whole, item content validity does play a role in the SMEs’<br />
evaluation of an item's testworthiness.<br />
The second objective of the research was to determine how SMEs and project<br />
psychologists perceived the usefulness of the CVR forms. It became evident<br />
that there was no universal agreement on the usefulness of the forms. Additionally,<br />
any project psychologist biases, either for or against the use of<br />
the forms, may have influenced how the psychologist administered the forms to<br />
239<br />
. .
t h e SME,;. This in turn may have biased the SMES’ ratings on the CVR forms<br />
and on the surveys as we11.<br />
The results of this study suggested several areas for future research.<br />
First, to the extent that the forms are used, one would expect the content<br />
validity of SKTs to improve over time since ideally, the forms would help<br />
SMEs identify and retain those items most essential to successful performance<br />
in the specialty. This could be observed by charting the average content<br />
validity rating for all SKTs over a period of years.<br />
Also, with the imminent manpower cutbacks, the USAFOMS mission may be directly<br />
affected. By charting these average content validity rating values, it<br />
would be possible to see whether the content validity of the test% declined.<br />
This would be helpful in illustrating the impact of cutbacks on the test. development<br />
mission at USAFOMS.<br />
Finally, it would be interesting to examine the relationship between project<br />
psychologist and SME responses to the survey questions and to investigate the<br />
possibility of psychologist biases affecting the SMEs’ use of the forms.<br />
Although the attitudinal portion of the research showed some disagreement as<br />
to the value of this quantitative procedure, statistically, the content validity<br />
of the items has a significant impact on the subsequent identification<br />
of items for reuse.<br />
References<br />
Carrier, M. R., Dalessio, A. T., and Brown, S. H. (1990).<br />
Correspondence<br />
between estimates of content and criterion-related validity values. Personnel<br />
Psychology, 43, 85-100.<br />
Distefano, M. K., Jr., Pryer, M. W., and Craig, S. II. (1980). Job-relatedness<br />
of a posttraining job knowledge criterion used to assess validity and<br />
test fairness. Personnel Psychology, 33, 785-793.<br />
Applica-<br />
Distefano, M. K., Jr., Pryer, M. W., and Erffmeyer, R. C. (1983).<br />
tion of content validity methods to the development of a job-related performance<br />
rating criterion. Personnel Psychology, 36, 621-631.<br />
Fitzpatrick, A. R. (1983). The meaning of content validity. Applied Psychological<br />
Measurement, 7, 3-13.<br />
Kesselman, G. A. and Lopez, F. E. (1979). The impact of job analysis on em-<br />
ployment test validation for minority and nonminority accounting personnel.<br />
Personnel Psychology, 32, 91-108.<br />
Lawshe, C . H . (1975). A quantitative approach to content validity. Personnel<br />
Psychology, 28, 563-575.<br />
240<br />
.
Barbara Jezior<br />
U.S. Army<br />
Natick Research, Development, and Engineering Center<br />
Natick, MA.<br />
Interpreting Rating Scale Results:<br />
What does a Mean Mean?<br />
Larry Lesher<br />
GEO-CENTERS, Inc.<br />
Newton Centre, MA.<br />
Richard Popper<br />
Ocean Spray Cranberries Inc.<br />
Lakeville-Middleboro, MA.<br />
Charles Greene Vanessa Ince<br />
U.S. Amly * U.S. Army<br />
Natick Research, Development, ,and Engineering Center Natick Research, Development, and Engineering Center<br />
Natick, MA. Natick, MA.<br />
How well soldiers like items they use in garrison or the field is often measured on<br />
Likert scales, and the mean ratings obtained from these scales are then used as<br />
indicators of user acceptance. In examining data contributing to 176 mean ratings<br />
of various Natickproducts we found that the means accurately predict the acceptor<br />
set, i.e. the percentage of soldiers who rated a product on the positive end of a scale.<br />
Knowing the percentage who find a product acceptable provides a more intuitive<br />
and concrete basis for product development or improvement decisions. For<br />
example, the product developer can operate from the knowledge that 66% find the<br />
product acceptable, insteadof amean rating that deems the product “slightly good.”<br />
Introduction<br />
Natick is deeply involved in consumer acceptability<br />
issues. We develop basic subsistence items for servicemen<br />
- rations, protective clothing, shelters, and airdrop<br />
equipment. These products support m annual procurement<br />
of over 3 billion dollars, making consumer (soldier)<br />
acceptance critic‘al. Items that are unacceptable could sit<br />
in wnrehouses or never be used, and the soldier would be<br />
lacking necessary equipment as well.<br />
To obtain additional quantitative information on how<br />
soldiers felt about our products, we started a large-scale<br />
systematic survey program six years ago. Like many, we<br />
operatedunder the assumption that one of thebest ways to<br />
measure and describe how well the soldier liked the<br />
products was to use the mean and other parameters derived<br />
from verbal rating scales.<br />
After analyzing over 7,000 questionnaires throughout<br />
the six years and writing many reports for managers and<br />
product project officers webegan toquestion this assumption.<br />
We ourselves began to get curiousabout what means<br />
were saying in respect to the measure of product accepta-<br />
241<br />
bility. For instance, while we felt that a mean of 5 on a 7point<br />
scale shoul(l denote a relatively acceptable product,<br />
we found that we usually had many more negative ratings<br />
than expected. Over time we also began to feel, on an<br />
intuitive level, that a mean of 6 on a 7-point scale indicated<br />
a “very” good product but our verbal anchor was labelling<br />
such a product as “moderately” good.<br />
Moreover, in describing survey results to product<br />
managers, we found that while the concept of an average<br />
is rather commonly understood, the accompanying parameters<br />
of standard deviations, skewness, etc., are not<br />
understood outside the research communities, nor should<br />
we expect them to be. The problem here is that a mean in<br />
isolation, whichis what amanageris grappling with when<br />
not understanding its accompanying parameters, can be<br />
very misleading. A manager who makes product decisionswithout<br />
somesemeofwhataratingdistributionisall<br />
about may m‘ake the wrong decisions.<br />
Another problem with means for many is that they<br />
don’t provide a good intuitive feel for what relative<br />
differences are in regard to measuring products, any<br />
statistically significant differences notwithstanding. For<br />
instance, if means differ by one scale point. some don’t
think that difference especially disc!oncerting, while others<br />
thinkit’s monumental - ‘and those viewpointscanbeirrespective<br />
of whether there is an understanding of the<br />
underlying distribution or not.<br />
These issues led us to the literature to see what had<br />
been reported on rating scale distributions in respect to<br />
product acceptability.<br />
The literature showed that, in recent years, a new<br />
objective measure for determining level of product acceptability<br />
labelled the “acceptor set” has been described<br />
in marketing research literature, especially that of the food<br />
industry (Gordon and Norback, 1985). The measure has<br />
been used in conjunction with food product optimization<br />
techniques and market positioning (Lagrange and Norback,<br />
1987).<br />
While the acceptor set canbe determined by a simple<br />
binary method (dichotomous question), both Gordon<br />
(198s) and Choi and Kosikowski (1985) described creating<br />
an “acceptor set” from scaled data by splitting the<br />
sample group into two percentages, the percentage who<br />
found a product acceptable and those who did not. For<br />
example, respondents to our 7-point scale (l=“dislike<br />
verymuch”107=“like verymuch”)couldbesplit intotwo<br />
groups - tither the S-7 group, or the l-4 group, with the 5-<br />
7 group constituting the acceptor set. Product optimization<br />
then means finding methods to increase the acceptor<br />
set percentages (however derived) as measures ofproduct<br />
improvement.<br />
Given those findings, we decided to look at the acceptorset<br />
conceptinrespect tooursurveydatabasetoseel~ow<br />
we could add to the definition ofproduct acceptability for<br />
both the manager and researcher.<br />
Methodology<br />
Data base description<br />
The rating scale data were obtained on questionnaires<br />
administered to approximately 7,500 combat arms soldiers<br />
who rated products such as field rations, protective<br />
clothing, tents, and airdrop equipment. Data collectors<br />
went to the survey sites after these soldiers had returned<br />
from major training exercises where they had used one or<br />
more of the products. Entire units were tasked to participate<br />
in the surveys; soldiers in these units could refuse to<br />
fill out questionnaires if they chose, but few did. The<br />
sample size at each site r‘anged from 200 to 400. The<br />
soldier population was male, with over 90% between the<br />
agcsofbetween 19and 23 andservingintheenlistedranks<br />
E-2 to E-4.<br />
The verbal rating scales were either 7-point or Y-point<br />
scales. The 9-point scale, which has also been called a<br />
hedonic scale (Peryam and Pilgrim, 1957), has been a<br />
242<br />
scale traditionally used in military and civilian food research<br />
formore than 30 years (Maller and Cardello, 1984).<br />
Itwasonlyusedforrating the taste ofspecificration(food)<br />
items. Acceptability ratings for ration attributes other<br />
than taste (e.g. acceptability of portion sizes) were obtained<br />
on 7-point scales.<br />
The verbal anchors for the 7-point scales were: goodbad,<br />
satisfied-dissatisfied, easy-difficult, comfortableuncomfortable<br />
andlike-dislike. The 9-point scale anchors<br />
werelike-dislike. Eachscale had adverbmodifiers forthe<br />
anchors that graduated in intensity. For example, the 7point<br />
good-bad scale was: .<br />
VERY MODERATELY SLIGHILY NEtTtLER MAD SLKWTLY MUDEKATELY VERY<br />
BAD BAD BAD NOR CQOD CXWD GOOD GOOD<br />
1 2 3 4 5 6 7<br />
Each of the scales had a neutraI point and the positive<br />
verbal anchors for the scales were at the high ends, i.e. 5-<br />
7 for the 7-point scale and 6-9 for the g-point scale. The<br />
product acceptnnce issues covered a wide range of variablessuchasdurability,<br />
appearance,comfort, taste, weight,<br />
compatibility (with other pieces of equipment), weatherproofing,<br />
warmth, and “overall” acceptability.<br />
Analysis<br />
We based our ‘analysis on randomly selected mean<br />
ratings from our survey data. The number of means<br />
selected was 155 for the 7-point scale and 21 for the 9point.<br />
The largest sample contributing to any particular<br />
mean numbered 347, and the smallest 34. The lowest<br />
mean rating on the 7-point scale w;1s 2.94 and the highest<br />
was 6.53; the lowest for the 9-point was 3.01 ‘and the<br />
highest 6.35. The mean of the means obtained on the 7point<br />
scales was 4.71 (SD=.75), while the mean of the<br />
means obtained on the 9-point scale was 4.59 (SD=.9 1).<br />
The distributions for all the selected means were unimodal.<br />
We explored the relationships of the means to the size<br />
of the acceptor sets through regression analyses. The<br />
acceptor set definition was the percent of ratings falling in<br />
the entire positive range for either scale, i.e. S-7 for the 7point<br />
and 6-9 for the 9-point scale.<br />
Results<br />
The results show extremely good fits with linear<br />
regression models for both 7- and 9-point scales. Figure<br />
1 shows the scatter plot for the relationship of the means<br />
to the acceptor sets for the 7-point scale. The R* in this c,ase<br />
is .97 with a regression equation of:<br />
y = - 54.45 + 24.13x.
Figure 2’s scatterplot shows the relationship of me‘ans<br />
to acceptor set for the 9-point scale; the R1 is .98 and the<br />
regression line is:<br />
y = - 26.04 + 14.71x.<br />
Figure 1<br />
I 2 3 4 5 6 ‘I<br />
MEAN R4TmGS<br />
Figure 2<br />
MEAN RATINGS<br />
Discussion<br />
When we derive acceptor set sizes from the regression<br />
equations for both the 7- ‘and g-point scales, the size of the<br />
acceptor sets affirms what we were seeing in our data for<br />
iridividual products. For instance, the acceptor set for a<br />
mean of 5 on the 7-point scale corresponds to an acceptor<br />
set size of 66% of the population, whereas a mean of 6 corresponds<br />
with 90%. A far greater number of negative<br />
ratings are seen for a mean of 5 as opposed to a 6: the<br />
negative and neutral population is decreased by 25%<br />
between means of 5 and 6.<br />
As mentioned earlier, the 9-point scale bar been used<br />
243<br />
forratingfooditemsinmilitaryrationssince 1957. Senior<br />
researchers who have spent many years in ration acceptability<br />
feel sure they have a very good item if a rating is a<br />
7 (“like moderately”). That is, the 7 is not a good rating<br />
by default or relative stature of the item in the ratings list,<br />
the feeling is that an item with a rating of 7 is very<br />
acceptable in an absolute sense. Correspondingly, the<br />
acceptor set picture for the g-point scale regression is: a<br />
mean rating of 7 shows an acceptor se1 size of 77%, 7.5<br />
shows 84%, and 8 shows 91%.<br />
The situations described above point to how acceptor<br />
sets can aid the defmition of product acceptability. If we<br />
unshackle the description of product acceptance from the -<br />
scale verbal anchors, which can make it appear that products<br />
‘are falling somewhat short because they are not<br />
achieving perfect scores, it may facilitate definition of<br />
product norms that are easier to deal with both intellectually<br />
and at gut level.<br />
For instance, if you tell product developers that a<br />
product is top of the line if 90% of the populace rates it<br />
positively, the statement has an intuitive logic to it. Product<br />
developers assume that no one product can please<br />
lOO%ofthepopulation. Evenifthereweresuchaproduct,<br />
it would still probably not achieve a perfect rating on any<br />
scaled measure because there is a lot at play in the rating<br />
game, e.g., raters tend to avoid end points on scales no<br />
matter how they feel about a product, frames of reference<br />
can be different among raters in regard to a product, and<br />
even the mood the rater is in that day can affect his or her<br />
rating.<br />
What the norms for products should be, as defined by<br />
the size of the acceptor set, i.e., excellent, good, average,<br />
or poor, are to be determined. One approach might be to<br />
determine the cumulative distribution frequencies for<br />
acceptor sets and think in terms of percentiles. Figure 3<br />
shows the application of this concept to the 7-point scale<br />
data; the graph shows that an acceptor set of 45% falls in<br />
the 25th percentile, while a set of 74% falls in the 75th<br />
percentile. To achieve a product that scores better than<br />
80% of all products tested, an acceptor set size of about<br />
Figure 3<br />
0 25 50 75 ,,m<br />
CIJMULAINE I’ERCEh’T
77% is needed. Other qualitative or quantitative data<br />
could also be used in conjunction with acceptor set size to<br />
establish breakpoint criteria.<br />
Going one step further, nomrs could, and should, be<br />
established for different product groups. Some types of<br />
products by their nature will never have large acceptor<br />
sets. so they should not have to be measured against<br />
products that do.<br />
Our findings obviously reinforce the research attesting<br />
to the value of the acceptor set to managers in the<br />
commercial world concerned with market positioning and<br />
product optimization. The military worldcan alsobenefit.<br />
Although our consumeris a captive consumer so to speak,<br />
there may be some bottom line applications of the acceptor<br />
set that me,ans c,an’t address.<br />
For instance, the U.S. Army can spend ‘around<br />
$3 I ,OOO,OOO a year on its standard operational ration. For<br />
the sake of hypothesis. assume those who didn’t like it<br />
didn’t eat it. What would that mean in terms of dollars’! If<br />
you were to assume further the ration overall had a mean<br />
rating of 5 (7-point scale), 66% would be eating it, and if<br />
it had a mean rating of 6,90% would be eating it. The<br />
differential of that one scale point amounts to $7,440,000<br />
in uneaten rations. This type of accountability would<br />
behoove the developer to improve acceptability in a way<br />
that simply looking at means couldn’t.<br />
Overall, we recommend using an acceptor set to<br />
communicate levels of acceptability and to use this measure<br />
in tandem with traditional scale statistics. Scale<br />
pammeters still convey infomlation that dichotomous or<br />
otherqualitativedatacannot. Forthemanager, however,<br />
the acceptor set will provide a far more intuitive grasp of<br />
the product findings and a firmer footing for product<br />
244<br />
optimization or market positioning decisions.<br />
The excellent fit of the linear regression mode1 is<br />
especially gratifying because of the simplicity it offers.<br />
The acceptor set grows linearly with product acceptance<br />
means. One integer of improvement in a mean translates<br />
into a constant percent change in the acceptor set, namely<br />
about 24% on a 7-point scale.<br />
References<br />
Choi, H.S., and Kosikowski, F.V. (1985). Sweetened<br />
plain &and flavored c,arbonated yogurt beverages.<br />
Journal ofDairy Science,68,913.<br />
Gordon, N.M. (1985). A product development, positioning<br />
and optimization process guided by organizational<br />
objectives. Mnsterof ScienceThesis. University<br />
of Wisconsin, Madison.<br />
Gordon,N.M. andNorback, J.P. (1985). Choosingobjective<br />
measures when using sensory methods for optimization<br />
and product positioning. Food Technology,<br />
39,( 1 1), 96.<br />
Lagrange, V. and Norback, J.P. (1987). Product optimization<br />
and the acceptor set size. Journal of Sensory<br />
Studies, 2, 119-136.<br />
Mailer, 0. and Cardello, A. (,1984X Ration acceptance<br />
methods: measuring likes and their consequences.<br />
Niederlands Militari Geneeskundig<br />
Tijdschrift.,37(79/1 lo), 91-96.<br />
Peryam, D.R. and Pilgrim, F. (1957). Hedonic scale<br />
method of measuring food preferences. Food Technology,<br />
11(9), Supplement 9.
---.---- ---._. -..<br />
\<br />
ASVAB, Description<br />
Joint-Service Computerized Aptitude <strong>Testing</strong><br />
W. A. Sands*<br />
Director, <strong>Testing</strong> Systems Department<br />
Navy Personnel Research and Development Center<br />
San Diego, California 921526800<br />
INTRODUCTION<br />
The Armed Services Vocational Aptitude Battery (ASVAB) is used by all the U.S.<br />
military services for both enlistment screening and classification into entry-level<br />
training. The current battery includes ten tests. The eight power tests are: General<br />
Science, Arithmetic Reasoning, Word Knowledge, Paragraph Comprehension, Auto<br />
and Shop Information, Mathematics Knowledge, Mechanical Comprehension, and<br />
Electronics Information. The two speeded tests are: Numerical Operations and Coding<br />
Speed. Administration of this conventional, paper-and-pencil test battery takes<br />
between 3 and 3 l/2 hours.<br />
The U.S <strong>Military</strong> Entrance Processing Command (USMEPCOM) administers<br />
ASVAB under two Department of Defense testing programs. In the Enlistment<br />
<strong>Testing</strong> Program, ASVAB is administered to over 800,000 applicants each year, in<br />
approximately 70 <strong>Military</strong> Entrance Processing Stations (MEPS) and 970 Mobile<br />
Examining Team Sites (METS) nationwide. In the Student <strong>Testing</strong> Program, ASVAB is<br />
administered to over l,OOO,OOO students annually, in over 15,000 schools.<br />
CAT-ASVAB Program<br />
Roles.. The U. S. Department of Defense initiated a Joint-Service research<br />
program to develop a Computerized Adaptive <strong>Testing</strong> (CAT) version of the battery<br />
(CAT-ASVAB) in FY 1979. At that time, the Department of the Navy was designated as<br />
Executive Agent, with the Marine Corps as Lead Service. Subsequently, the Lead<br />
Service responsibility was assigned to the Navy. The Navy Personnnel Research and<br />
Development Center (NPRDC) was designated as the Lead R&D Laboratory. The Air<br />
Force was assigned responsibility for the development of the large banks of test items<br />
needed for CAT-ASVAB. The Army was assigned responsibility for the procurement,<br />
deployment, and implementation of the full-scale operational testing system.<br />
Q&ectives. The Joint-Service CAT-ASVAB Program has three objectives: (1)<br />
develop a CAT version of the ASVAB, (2) develop a computer-based delivery system<br />
that will support the new test battery, and (3) evaluate CAT-ASVAB as a potential<br />
replacement for the paper-and-pencil version of the battery (P&P-ASVAB).<br />
* The opinions expressed in this paper are those of the author, are not official, and do not<br />
necessarily represent those of the Navy Department.<br />
245
Purpose<br />
ACCELERATED CAT-ASVAB PROJECT<br />
The Accelerated CAT-ASVAB Project (ACAP) is designed to develop and field-test<br />
CAT-ASVAB in the shortest time possible. The idea is to collect “lessons learned”<br />
information about the new test battery, the delivery system (including both<br />
hardware and software), and the testing environment. The information obtained will<br />
be used to specify the functional requirements for the full-scale, operational system.<br />
Delivery System<br />
The Hewlett-Packard Integral Personal Computer (HP-IPC) was selected for the<br />
ACAP System. The HP-IPC is a powerful microcomputer system which, for the<br />
examinee testing station, includes: a Motorola 68000, 16/32 bit microprocessor; a<br />
graphics co-processor; 1.5 megabytes of Random Access Memory (RAM); a built-in,<br />
3.5-inch, 7 10K byte microfloppy disk drive; a g-inch amber, high-contrast<br />
electroluminescent flat screen display, supporting a 255 by 512 pixel, bit-mapped<br />
display, and a standard window size of 24 lines by 80 characters; a go-key, lowprofile,<br />
detachable keyboard (which was modified by a template, leaving only the<br />
necessary keys exposed); and a built-in inkjet printer. The entire computer has a 7<br />
by 16 inch footprint, requiring less than one square foot of desk space. The software,<br />
developed at NPRDC, is written in the C programming language, running under a<br />
UNIX operating system (HP-UX).<br />
Field Research Activities<br />
The Accelerated CAT-ASVAB Project (ACAP) involves six field research<br />
activities:<br />
Pre-Test. The purpose of this research was to insure that examinees could<br />
easily use the CAT-ASVAB System (including hardware and software). <strong>Military</strong><br />
recruits and students from high school special education classes were administered<br />
CAT-ASVAB. In the aggregate, they represented the full range of mental ability.<br />
Results were very encouraging. The examinees found CAT-ASVAB easier and faster<br />
than paper-and-pencil tests that they had taken. They liked the fact that it was selfpaced,<br />
and involved little writing. Some examinees expressed concern that they<br />
could not skip over items, nor go back to previous items and change their answers.<br />
Some examinees indicated that their eyes became tired, which emphasized the<br />
importance of avoiding glare on the screens. Administration instructions were<br />
revised, based upon information from questionnaires and interviews. This revision<br />
reduced the reading grade level from the eighth to the sixth grade. The Pre-Test was<br />
completed in November 1986.<br />
Medium of Administration. The purpose of this research was to evaluate the<br />
effect of the calibration medium of administration on score precision. The subjects<br />
were from the Navy Recruit Training Center in San Diego. Forty-item conventional<br />
tests were constructed for General Science, Arithmetic Reasoning, Word Knowledge,<br />
Shop Information, and Paragraph Comprehension. Subjects were randomly assigned<br />
to one of three groups. The first group took the tests on computer; these data were<br />
used to obtain a computer-based calibration of items. The second group took the same<br />
246<br />
.
tests in a paper-and-pencil mode; these data were used to obtain paper-and-pencil<br />
calibration information. Each of these calibrations was used to estimate the ability of<br />
examinees assigned to the third group, who took the tests on computer. Lengthy test<br />
administration time required splitting the study into two phases. General Science,<br />
Arithmetic Reasoning, Word Knowledge, and Shop Information were addressed in the<br />
first phase. Results from this phase showed: (a) no practical differences in the<br />
estimation of abilities; (b) small, but statistically significant differences in different<br />
tests; and, (c) no significant differences in test reliabilities. The second phase<br />
involved the administration of the Paragraph Comprehension test. Data for this<br />
second phase have been collected and anaIyses are underway.<br />
Cross-Correlation. The purpose of this research was to compare the<br />
measurement precision of CAT-ASVAB and P&P-ASVAB. Subjects were from the ‘Navy<br />
Recruit Training Center, San Diego. Each recruit had taken an operational form of<br />
P&P-ASVAB which was used for enlistment purposes. The total sample was split into<br />
two groups. The first group took CAT-ASVAB Form 1. then CAT-ASVAB Form 2. The<br />
second group took P&P-ASVAB Form 9B, then P&P-ASVAB Form 10B. The second test<br />
for each group was administered about five weeks after the first test. Results indicate<br />
that, despite using substantially fewer items, CAT-ASVAB exhibits significantly<br />
higher alternate form reliability than P&P-ASVAB for most tests, while no P&P-<br />
ASVAB test demonstrates significantly higher reliability than the comparable CAT-<br />
ASVAB test.<br />
Preliminarv Ooerational Check. The purpose of this research was to<br />
demonstrate the communications interface between the ACAP System and USMEPCOM<br />
computer system. The testing procedures were performed jointly by NPRDC and<br />
USMEPCOM personnel at the Seattle MEPS. Data from examinees were loaded onto the<br />
Data Handling Computer at the MEPS, then transferred to the USMEPCOM System-80<br />
minicomputer. Comparison of the data before and after the transfer showed the<br />
procedure was completed with perfect accuracy.<br />
Sco e Eauat’ p Development. The purpose of this research was to equate CAT-<br />
ASVAB with P&P-:SVAB. Equating is essential to insure that the two forms of the<br />
battery are on the same metric, and that the scores are interchangeable. Subjects<br />
were applicants for enlistment at six MEPS (San Diego, Richmond, Seattle, Boston,<br />
Omaha, and Jackson) and their satellite MET sites. These six MEPZYMETS complexes<br />
were selected because, in the aggregate, their applicants are representative of the<br />
nation. The operational measures included P&P-ASVAB Forms lOA, lOB, 11 A, llB, 13A,<br />
and 13B. There were two forms of the CAT-ASVAB (both non-operational). Finally,<br />
P&P-ASVAB Form 8A was used as the non-operational reference battery. Subjects<br />
were randomly assigned to one of three groups. The first group took CAT-ASVAB<br />
Form 1, then the operational P&P-ASVAB. The second group took CAT-ASVAB Form 2,<br />
then the operational P&P-ASVAB. The last group took the reference battery (P&P-<br />
ASVAB Form 8A), then the operational P&P-ASVAB. In each case, the testing was<br />
done on the same day or on successive days. Data collection, editing, and equating<br />
analyses have been completed. New equating procedures have been developed and<br />
applied. Analyses indicated that composite equatings were unnecessary. ‘Provisional<br />
equating tables for operational use in<br />
the subsequent Score Equating Verification<br />
study were developed. The ACAP microcomputer delivery system has performed<br />
satisfactorily, exhibiting fewer problems than anticipated. Finally, the logistics of<br />
testing in the numerous, heterogeneous MEPS/MET sites nationwide has presented no<br />
insurmountable problems.<br />
247
Score Eauating Verification. The Score Equating Verification study is designed<br />
to evaluate the effect of examinee motivation upon item calibration and equating.<br />
The examinees are applicants for military service who are processing through the<br />
same six MEPQMETS complexes used in the Score Equating Development study.<br />
measures include two forms of CAT-ASVAB and one form of P&P-ASVAB (8A). The<br />
CAT-ASVAB scores are based on the provisional equating tables developed in the<br />
Score Equating Development study, and count as scores of record for enlistment. Data<br />
collected during this study will be used to develop final equating tables for<br />
subsequent operational use. Data collection began on 3 September 1990. This was the<br />
first time that CAT-ASVAB test results have counted as scores of record and, therefore,<br />
determined enlistment eligibility and subsequent training opportunities for<br />
applicants to the military services. Plans call for data collection to be completed in<br />
June 1992, analyses to be completed in November 1992, and results documented by<br />
May 1993.<br />
Technical Base R&D<br />
ENHANCED CAT-ASVAB<br />
During the past several years, each of the Service R&D laboratories has been<br />
investigating computer-administered tests which measure abilities not measured by<br />
the current ASVAB tests. These include measures of psychomotor ability, spatial<br />
ability, and working memory.<br />
Technical Advisory Selection Panel<br />
A Joint-Service Technical Advisory Selection Panel (TASP) was established to<br />
evaluate new computerized tests which showed promise and to nominate <strong>Military</strong><br />
Occupational Specialities (MOSS) for a Joint-Service Enhanced CAT-ASVAB @CAT)<br />
validation study. This committee was chaired by a representative of the Defense<br />
Manpower Data Center (DMDC) and included a technical representative from each of<br />
the Services and USMEPCOM. General criteria employed in evaluating the alternative<br />
tests included the theoretical development of the underlying construct, measurement<br />
precision, validity, equating, and operational feasibility.<br />
Joint Service ECAT Validation Study<br />
The TASP recognized that the amount of testing time available in the field was<br />
limited, and that not all promising tests could be administered. Therefore, the tests<br />
are grouped into primary and secondary categories. The primary group, to be<br />
administered to all examinees, includes: (1) Integrating Details, (2) Target<br />
Identification, (3) Figural Reasoning, (4) Two-Hand Tracking, and (5) Sequential<br />
Memory. The secondary group includes: (1) Assembling Objects, (2) Orientation, (3)<br />
One-Hand Tracking, and (4) Mental Counters. These secondary tests will be<br />
administered only in those situations where time permits.<br />
The <strong>Military</strong> Occupational Specialties (MOSS) involved in the Army include:<br />
Infantryman (1 lH), Cannon Crewman (13F), and Tank Crewman (19K). Air Force jobs<br />
include: Air Traffic Controller (27230) and Personnel Specialist (73230). The Marine<br />
248<br />
The
Corps MOSS will include: Motor Transportation (35Xx). and Aircraft Maintenance<br />
(61Xx). Finally, the Navy ratings will include: Air Traffic Controller (AC),<br />
Operations Specialist (OS), Fire Controlman (FC), Electronics Technician/Advanced<br />
Electronics Field (ET (AEF)), Radioman (RM), Engineman (EN), Aviation Structural<br />
Mechanic - Structures (AMS), Aviation Electrician’s Mate (AE), Aviation Electronics<br />
Technician (AT), Aviation Fire Control Technician (AQ), Aviation Antisubmarine<br />
Warfare Technician (AX), Aviation Ordnanceman (AO), Gunner’s Mate - Phase I<br />
(GMG), Machinist’s Mate (MM), and Electrician’s Mate (EM).<br />
Data collection for the Joint-Service ECAT Validation study began in February<br />
1990 and will continue through August 1991, with analyses, documentation, and<br />
reviews scheduled for completion in July 1992.<br />
Navy ECAT Validation Study<br />
The purpose of this study is to determine the incremental validity of new<br />
predictor tests for augmenting ASVAB for selected Navy ratings. It will provide<br />
additional information to the Joint-Service ECAT Validation study described above for<br />
assessing the cost-effectiveness of computerized testing.<br />
The experimental test battery includes six tests, followed by a seven-item<br />
questionnaire. Average administration time is 2 l/2 hours. The six tests are: (1)<br />
Mental Counters, (2) Sequential Memory, (3) Integrating Details, (4) Space<br />
Perception, (5) Spatial Reasoning, and (6) Perceptual Speed. The short questionnaire<br />
is designed to obtain information on examinee fatigue, motivation, and computer<br />
experience.<br />
Data have been collected from the following Navy schools: Operations<br />
Specialist (OS), Aviation Structural Mechanic - Structures (AMS), Aviation<br />
Ordnanceman (AO), Aviation Electronics Technician (AT), Aviation Fire Control<br />
Technican (AQ), Aviation Antisubmarine Warfare Technician (AX), Gunner’s Mate -<br />
Phase I (GMG), Machinist’s Mate (MM), Propulsion Engineering Basics, Aviation<br />
Machinist’s Mate (AD), Boiler Technician (BT), Hospitalman (HM), and Hull<br />
Maintenance Technician (HT).<br />
Data collection for the Navy ECAT Validation study has been completed.<br />
Analyses, documentation, and review of the results should be completed in December<br />
1990.<br />
CONCEPT OF OPERATIONS<br />
The concept of operations for the CAT-ASVAB System has not been finalized.<br />
In a previous study, four alternative deployment strategies were selected for special<br />
attention: (1) Centralized CAT-ASVAB testing at MEPS, with elimination of all METS<br />
testing; (2) High Volume Site <strong>Testing</strong> (all MEPS and 273 METS); (3) use of a CAT<br />
screening instrument at the military recruiting stations, with subsequent full CAT-<br />
ASVAB testing of screened personnel at MEPS, and (4) administration of CAT-ASVAB<br />
in mobile vans, testing at MEPS and fifty high-volume METS. The current<br />
operational scenario involving the administration of P&P-ASVAB in all MEPS and<br />
METS provided a baseline case for comparison purposes.<br />
249
ECONOMIC ANALYSES<br />
Department of Defense and Department of the Navy regulations require<br />
performing an economic analysis to assist in determining whether or not a system<br />
is cost-effective. An initial study was conducted by a contractor, whose<br />
representatives visited each of the MEPS in the continental United States to collect<br />
cost information in four areas: (1) development, (2) procurement, (3)<br />
implementation, and (4) operations and support.<br />
The Brogden-Cronbach-Gleser approach to test utility evaluation was<br />
employed. This approach assesses the dollar utility of the incremental validity of a<br />
new instrument (e.g., CAT-ASVAB) over the validity of an existing instrument ‘(e.g.,<br />
P&P-ASVAB) in terms of improved performance. A conventional ten-year economic<br />
life was used, and the net life cycle benefit computed for each alternative concept of<br />
operation. The incremental validity used for CAT-ASVAB was 0.002, a conservative<br />
estimate based upon simulation results assessing the increased precision of CAT-<br />
ASVAB over P&P-ASVAB. The results appear promising for two concepts of operation:<br />
centralized testing, and the recruiter screening approach. The high-volume site and<br />
mobile van concepts were not cost-effective.<br />
A pivotal issue in these economic analyses is the actual increment in validity<br />
which can be expected by using CAT-ASVAB instead of P&P-ASVAB. While simulation<br />
results were adequate for initial analyses, empirical data are necessary for any<br />
conclusive evaluation. Therefore the Manpower Accession Policy Steering<br />
Committee (MAPSC) instructed the Executive Agent to evaluate new tests which<br />
offered significant promise for enhancing ‘the predictive effectiveness of the<br />
current battery.<br />
A final economic analysis study will bc performed under contract. Results<br />
from this study will be crucial in determining whether or not CAT-ASVAB should be<br />
implemented nationwide.<br />
250
ASSESSMENT OF APTITUDE REQUIREMENTS FOR<br />
NEW OR MODIFIED SYSTEMS<br />
Lawrence H. O’Brien<br />
Dynamics Research Corporation<br />
Wilmington, Ma.<br />
INTRODUCTION<br />
Recent Department of Defense initiatives on manpower. personnel, and training<br />
call for an assessment of the “aptitude requirements of new systems.” For example,<br />
AR 602-2, Manpower and Personnel Integration (MANPRINT) in the Materiel<br />
Acquisition Process, requires that “For material with a predominant human<br />
interface, it is critical to collect and evaluate human performance reliability data to<br />
determine whether the proposed system concept will deliver expected<br />
performance with no greater aptitudes and no more training than planned.” DOD<br />
Directive 5000.53, Manpower, Personnel, Training, and Safety in the Defense<br />
Acquisition Process, requires that descriptions of the “quality and quantity of<br />
military personnel” needed to field a system be developed and updated during the<br />
acquisition process. The directive indicates that the descriptions of military<br />
personnel quality requirements “shall include distributions of skill, grade, aptitude,<br />
anthropometric and/or physical attributes, education, and training backgrounds.”<br />
KEY QUESTIONS RELATED TO APTITUDE ASSESSMENT FOR NEW SYSTEMS<br />
Aptitude assessments for new weapon systems seek to address two basic questions:<br />
Question I: Can the svstem be successfullv onerated and maintained bv the soldiers<br />
WI<br />
To determine if the system is successful, one must (a) identify the functions that<br />
the system is supposed to perform, (b) identify the measures that can be used to<br />
assess performance on these functions, (c) establish criteria for these measures,<br />
(d) either collect “test’ data on or estimate system performance, and (e) compare<br />
the system performance with the criteria. If performance exceeds the criteria, the<br />
system is judged successful.<br />
The term “by the soldiers who are expected to man it” implies detailed<br />
consideration of soldier characteristics such as aptitudes. More specifically, it<br />
assumes that data will be obtained from soldiers who are “representative” of the<br />
soldiers who will actually be assigned to t.he system.<br />
To identify “representative” soldiers, one must first identify the key personnel<br />
characteristics which impact soldier performance. Aptitudes such as scores on<br />
Armed Services Vocational Aptitude Battery are especially important because they<br />
are used by the Army to control entry into the Army or MOS. This is accomplished<br />
by setting cut-offs or minimum acceptable scores on these characteristics.<br />
The best way to select “representative” soldiers for inclusion in system testing is to<br />
randomly sample from a population that has the same distribution of these<br />
characteristics as the population of soIdiers who are expected to man the system.<br />
However, future dislribution of these characteristics within a particular MOS may<br />
251
e different than their current distribution. Since most Army systems take 5- 10<br />
years to develop, the capability to estimate the future distribution of key personnel<br />
characteristics is a critical prerequisite for describing the soldiers who are likely to<br />
be available .to man the system.<br />
Estimating the future distributions of these aptitudes is not simple since these<br />
distributions are impacted by a number of factors. First, the distributions are<br />
impacted by the cutoffs that the Army sets for these aptitude measures. These<br />
cutoffs eliminate soldiers who score below the cutoffs both from accessions and<br />
from distributions of the aptitudes at higher paygrades. However, the cutoffs are<br />
not the only factors determining these distributions. The distributions are also<br />
impacted by the distribution of the aptitudes in different subpopulations of the<br />
general population at a particular point in time, the propensity of those<br />
subpopulations to enlist at various aptitude levels, and the rates (e.g.. reenlistment<br />
rates) with which the subpopulations transition through the Army personnel<br />
system.<br />
I<br />
Question 2: Can the svstem be operated and maintained within available<br />
mannower. personnel, and training resource constraints?<br />
This question seeks to assess the “personnel affordability” of the new system. The<br />
resource capabilities of the Army are limited. The total end strength of the Army is<br />
fixed annually by Congress. The Army’s capability to recruit high quality personnel<br />
is restricted by the recruiting budget. To effectively deal with these resource<br />
limitations, the Army must set constraints for critical resources such as personnel.<br />
During the acquisition process, resource requirements for the new system must be<br />
established and compared with the constraints. If the requirements do not exceed<br />
the constrainls, the system is affordable; otherwise, it is not. Manpower constraints<br />
describe the maximum number of people who will be available to man the new<br />
system. Personnel constraints describe: (a) expected cutoff values for key<br />
characteristics such as aptitude, and (b) the expected distribution of these<br />
characteristics above the cutoff.<br />
APPROACH FOR ASSESSING APTITUDE IMPACTS ON SYSTEM PERFORMANCE<br />
The relationship between aptitudes and system performance is not a direct one.<br />
Aptitudes impact the performance of the tasks required to operate or maintain the<br />
system. Performance on these individual tasks determines overall system<br />
performance. Assessing the relationships between the performance of individual<br />
system tasks and system performance requires consideration of the complex causal<br />
and sequential relationships among tasks. Task performance will vary as a function<br />
of the conditions under which the tasks will be performed. These conditions will<br />
vary across time and across scenarios.<br />
Measures of Svstem Performance,. A number of metrics can be used to quantify<br />
system performance. Typically, two types of measures are developed: operational<br />
effectiveness (e.g. mission performance time or success) and system availability<br />
(e.g. system reliability, availability, or maintainability).<br />
252
I<br />
AFI’ITUDE ASSESSMENT TOOLS FOR NEW SYSTEMS<br />
As part of the HARDMAN 111 program, the Army Research Institute (ARI) has<br />
developed two microcomputer-based tools that can be used to assist Army analysts<br />
in identifying aptitude requirements and constraints for new systems--the<br />
Personnel Constraints Aid or P-CON and the Personnel-Based System Evaluation<br />
Aid or PERSEVAL. 1<br />
P-CON. estimates personnel quality constraints. More specifically, the P-CON Aid<br />
estimates the future distribution of key personnel characteristics. These<br />
distributions describe the numbers and percentage of personnel that will be<br />
available at each level of the personnel characteristics. The P-CON Aid also<br />
provides guidance to help Army analysts and contractors understand the impacis of<br />
setting constraints at different personnel characteristic levels. For example, the<br />
P-CON Aid will display the levels of performance that can be expected at each of<br />
these levels. An analyst can us’e the information on expected performance to set<br />
personnel constraint levels for each characteristic.<br />
The P-CON Aid first estimates what the future distribution of the personnel<br />
characteristics will be. Then, it uses results from analyses of the Project A data<br />
base to show what levels of performance are achievable at different characteristic<br />
levels. The user may then use the information on both personnel availability and<br />
performance to identify minimum acceptable levels for each personnel<br />
characteristic.<br />
PER-SEVAL. The PER-SEVAL Aid determines what level of personnel<br />
characteristics is needed to meet system performance requirements given a<br />
particular contractor’s design, fixed amounts of training, and the specific<br />
conditions of performance under which the system tasks will be performed.<br />
The PER-SEVAL Aid has three -basic components. First, PER-SEVAL has a set of<br />
performance shaping functions that predict performance as a function of ASVAB<br />
area composite and training. Separate functions are provided for different types of<br />
tasks. The primary data source for developing the functions were results from a<br />
regression analyses from the Project A data base. Second, the PER-SEVAL Aid has<br />
a set of stressor degradation algorithms that degrade performance to reflect the<br />
presence of critical environmental stressors. Third, the PER-SEVAL Aid has a set<br />
of operator and maintainer models that aggregate the performance estimates of<br />
individual tasks and produce estimates of system performance.<br />
RECONCILING THE JOB-BASED SYSTEM AND SYSTEM-BASED APPROACH TO<br />
APTITUDE REQUIREMENTS ASSESSMENT<br />
Assessment of aptitude requirements requires consideration of the impact of<br />
aptitudes on “performance”. The personnel and the system development<br />
communities have different conceptualizations of performance. The personnel<br />
community tends to focus on “job performance” while the system development<br />
1 HARDMAN III is a major developmental effort of ARI’s System Research<br />
Laboratory. Its objective is to develop a set of automated aids to assist Army analysts<br />
in conducting MANPRINT assessments during the Materiel Acquisition Process<br />
(MAP).<br />
253<br />
.
community tends to focus on “system performance” Traditionally, most of the<br />
previous’work on assessing aptitude requirements has been based on the job<br />
performance perspective. Yet, aptitude requirements (i.e. ASVAE3 area composite<br />
cutoffs] are set for occupational specialties not weapon systems. The tasks<br />
associated with a particular weapon system may only constitute a subset of the total<br />
amount of tasks assigned to a particular occupational specialty.<br />
Figure 1 displays a strategy for linking the job-based and system-based approaches<br />
for aptitude assessment. Prior to the development of the new system, the<br />
personnel community will set a ASVAl3 area composite for each MOS. It is assumed<br />
that the process for setting this cut-off will include consideration of the impact of<br />
the cut-off on “job performance.” During the system development process, the,P-<br />
CON and PER-SEVAL tools can be applied to determine the impact of the cut-off on<br />
system performance. P-CON can be used to project what the future distribution of<br />
personnel will be at or above the cut-off and PER-SEVAL can be used to determine<br />
if this populatitin can successfiilly meet system performance requirements. If<br />
system performance is adequate, no change in aptitude cut-off is needed. If system<br />
performance is not adequate, the possibility of using higher cut-offs can be<br />
examined. The P-CON tool can be used to examine the impact of higher cut-offs on<br />
personnel availability (i.e, the numbers of people at or above the cut-off). P-CON<br />
outputs can be used to assess the impact of personnel availability on the Army’s<br />
ability to provide the manpower to successfully man the new system. Another ARI<br />
tool, the Army Manpower Cost System or AMCOS. can be used to assess the<br />
personnel costs associated with recruiting higher aptitude personnel. The<br />
information on system performance, personnel availability, and personnel costs can<br />
then be used by the personnel community in reassessing the MOS cut-off. It is<br />
assumed that this assessment will consider the impact of the aptitude change on<br />
total “job performance.”<br />
Personnel<br />
Community<br />
LSets MOS<br />
Cut-Off<br />
‘J<br />
pact of<br />
riig her<br />
. . . . . . ..-I..- -n<br />
AVellaOlllly<br />
and Cost<br />
I V I<br />
t<br />
Figure 1. Potential relationships between job and system perspectives<br />
254<br />
Personnel<br />
Community<br />
Reassess<br />
Cut-OH
Currently. most personnel psychologists view job performance as multidimensional<br />
construct. For example, using data obtained from the Project A study, Campell,<br />
McHenry. and Wise (1990) have developed a model of Army job performance that<br />
has five factors: core technical proficiency, general soldier proficiency, effort and<br />
leadership, personal discipline, and physical fitness and military bearing. Clearly,<br />
performance on system performance tasks is closely related to one of these<br />
components--technical proficiency. As Sadacca, Campell. Difazio. Schultz, and<br />
White (1990) have pointed out the utility of the different job components may vary<br />
across jobs. The need to raise a particular MOS ASVAB cut-off will depend on how<br />
much importance Army decision makers attach to technical proficiency vice the<br />
other job components for the particular MOS being investigated.<br />
’ REFERENCES<br />
Army Regulation 602-2, 19 April 1990, Manpower and Personnel Integration<br />
(MANPRINT) in the Materiel Acquisition Process<br />
Campell JP, McHenry JJ, and Wise LL. (1990) Modeling job performance in a<br />
population of jobs. PERSONNEL PSYCHOLOGY, 43, 313-333.<br />
DOD Directive 5000.53, 30 December 1988, Manpower, Personnel, Training, and<br />
Safety in the Defense Acquisition Process<br />
Sadacca R, Campell JP. Difazio AS, Schultz SR. and White LA (1990). Scaling<br />
performance utility to enhance selection/classification decisions. PERSONNEL<br />
PSYCHOLOGY, 43.367-378.<br />
System Research and Applications Corporation. (1990). Army Manpower Cost<br />
System Active Component Life Cycle Cost Estimation Model Information Book.<br />
Arlington, Va.<br />
255<br />
.
The Practical Impact of Selecting TOW Gunners<br />
with a Psychomotor Test<br />
Amy Schwartz and Jay Silva<br />
The U.S. Army Research Institute for the<br />
Behavioral and Social Sciences<br />
The ongoing reduction in defense forces has focused the<br />
interest of Army management on how to maintain current deterrent<br />
and combat power with fewer soldiers. One approach is to improve<br />
the person-to-job match in entry positions. A better match may<br />
lead to lowered attrition and better performance among those 'who<br />
are selected. New selection tests, developed through the Army's<br />
Project A (Campbell, 1990), have been shown to contribute<br />
significantly to the prediction of training performance in a<br />
variety of MOS (e.g., Busciglio, 1990; Busciglio, Silva & Walker,<br />
1990). If these tests were used to classify recruits who have<br />
been selected into a family of MOS, an increase in assignment<br />
efficiency into specific MOS could result.<br />
One application of a newly developed psychomotor test is the<br />
prediction of 11H TOW (Tube-launched, Optically-tracked, &ireguided)<br />
gunner performance. Currently, recruits are accessioned<br />
into the generic MOS 11X (Infantryman) using the Combat (CO)<br />
composite of the Armed Services Vocational Aptitude Battery<br />
(ASVAB) . They are later classified into one of four Infantry MOS<br />
including 11H TOW gunners. Previous research found that<br />
psychomotor tests, especially one which required two-hand<br />
tracking of a target (Two-hand Tracking test), accounted for a<br />
significant amount of variance of simulated gunnery performance<br />
beyond that explained by the ASVAB Combat composite for TOW<br />
gunners (Silva, 1989).<br />
The present analyses examined the practical benefits of<br />
using scores on a psychomotor test to select TOW gunners. First,<br />
the potential performance gains that can be accomplished with the<br />
additional test were examined. However, performance gains for<br />
11H's may result in decreases in the quality of recruits in the<br />
remaining Infantry MOS. Determining the overall effect of<br />
implementing the new test ideally would require criteria<br />
performance data for all Infantry MOS. Since these data were not<br />
available, the impact of the additional test was examined by<br />
comparing general quality of recruits selected into the 11H MOS<br />
with that of the remaining recruits who would be assigned to the<br />
other MOS in the 11 series. Armed Forces Qualifications Test<br />
(AFQT) scores, which are currently an accepted measure of<br />
quality, were used for this comparison. Thus, the purpose of<br />
this research is to demonstrate the contribution of Two-hand<br />
Tracking to predicting TOW gunner performance, while considering<br />
the general impact of implementing the new tests for<br />
classification purposes.<br />
256<br />
.
Sample<br />
METHOD<br />
The sample consisted of 911 recruits initially selected as<br />
11X Infantrymen based on a minimum CO composite score of 85 who<br />
were then classified as 11H TOW gunners. For the present<br />
purposes, the 11Hls were assumed to have been randomly chosen<br />
from 11X's and therefore contain the same properties as the 11X<br />
population. In order to test this assumption, t-tests were<br />
conducted comparing the AFQT and CO mean scores of the current<br />
sample (AFQT M=56.66, co $$=109.71) with those of a sample of'<br />
17,000 11X's (AFQT &=57.82, Co IJ=110.22) and there were no<br />
significant differences in the means. Because of this<br />
comparability, the current sample of 11H's were considered to be<br />
representative of the total 11X population.<br />
Procedure<br />
Recruits were given the Two-hand Tracking test along with<br />
other psychomotor measures during in-processing at the Reception<br />
Battalion. Classification of the examinees into specific MOS was<br />
not based on Two-hand Tracking scores. The procedure for<br />
assignment appears to be based on demand from each of four<br />
possible assignment MOS. During TOW gunnery training, gunnery<br />
data were collected using high-fidelity gunnery simulators.<br />
Measures<br />
Two-hand Tracking. This test measures two-hand coordination on a<br />
scale of distance from target accuracy. This score has been<br />
standardized (T distribution) and inverted such that a higher<br />
score indicates better performance than a lower score.<br />
Combat. This composite is the sum of four standardized ASVAB<br />
subtests: Arithmetic Reasoning (AR), Auto and Shop Information<br />
(ASI t Coding Speed (CS) and Mechanical Comprehension (MC). A<br />
score of at least 85 is needed to qualify for the 11X MOS.<br />
Combined Score. This is the optimally weighted predicted<br />
composite of both Two-hand Tracking and Combat.<br />
Training course performance. The criterion scores indicate<br />
performance on a TOW anti-tank gunnery simulator which requires<br />
the gunner to track a moving target (i.e., a target mounted on a<br />
moving vehicle) through an infrared optical device. The two<br />
measures of interest in the present study include Event 3, the<br />
trainee's score on the first qualifying set (an index of time on<br />
target) and Pass 3, whether the trainee passed or failed on the<br />
first qualifying set.<br />
257
RESULTS AND DISCUSSION<br />
Table 1 shows the correlations among the predictors and<br />
criteria. The joint effect of using both CO and Two-hand<br />
Tracking scores has been included under the heading 'combined.'<br />
The correlations of 'combined' with Combat and Two-hand Tracking<br />
are provided as an indication of the weights of each predictor in<br />
the optimal linear combination.<br />
To evaluate the practical significance of this method it<br />
must first be demonstrated that the proposed predictors will'<br />
improve the prediction of training performance. This is<br />
supported by the multiple regression results (see Table 2). The<br />
predictors significantly explain variance in performance on Event<br />
3 when they are used alone [CO F(1,909)=49,.84, ~
Table 1<br />
Correlation Matrix of Predictors and Criteria<br />
Pass 3 . 76**<br />
Combat . 23**<br />
Two-hand .31**<br />
Criteria<br />
Predictors<br />
Event 3 Pass 3 Combat Two-hand<br />
%<br />
. 12**<br />
. 23** . 34**<br />
Combined .33** . 23** . 68** . 92**<br />
Note. **pc.OOOl. llCombinedlt represents the correlation between<br />
the predictor and the predicted values based on a linear<br />
combination of both predictors.<br />
A second practical concern is the quality of the remaining<br />
recruits to be assigned to the MOS in the 11X series. If all of<br />
the recruits who score high on the additional tests are placed<br />
into one MOS, the remaining MOS will receive less qualified<br />
individuals. It has already been demonstrated that Two-hand<br />
Tracking is as good (if not better) a predictor of TOW gunner<br />
performance as CO. It remains to be shown that selection based<br />
on Two-hand Tracking will lead to less of a decrease in the<br />
quality of the remaining recruits than CO.<br />
Table 2<br />
Predicting Performance on Event 3 Using CO and Two-Hand Tracking<br />
Model R square df F<br />
co . 052 l/909 49.84**<br />
2-Hand<br />
Tracking<br />
co &<br />
a-Hand<br />
Tracking<br />
. 094 l/909 94.06**<br />
. 111 2/908 56.69**<br />
Note -0 **p
_____ I- ..-.- --_ - ..-.. * .._. - -._._. ._.. .._.. --- ___- --...-. ._~. _ . ..____ __..... ~_<br />
Table 3<br />
Mean Performance at Cutoffs for AFQT, Actual and Predicted Scores<br />
on Event 3 and Passing Rate at Event 3<br />
Predictor SR AFQT Actual Pred. Pass/fail<br />
Score at Score at Event 3<br />
Event 3 Event 3<br />
Combat 20% 78.14 646.95 650.96 .879<br />
Two-hand 20%<br />
Tracking<br />
50%<br />
50% 68.40 633.38 633.52 .850<br />
80% 61.38 620.44 619.49 -834<br />
80%<br />
Combat & 20%<br />
Two-Hand<br />
Tracking 50%<br />
80%<br />
63.75 650.66 659.04 .912<br />
60.79 640.39 640.35 .890<br />
59.51 627.41 " 624.20 -853<br />
71.10 654.57 664.02 ,907'<br />
64.97 643.38 643.15 .881<br />
60.17 625.28 625.40 .848<br />
No Selection 56.66 609.94 .814<br />
Raising the cutoff on CO would lead to higher mean AFQT scores<br />
for llH, especially at lower SRs. This would lead to a depletion<br />
of high ASVAB quality recruits for the other Infantry MOS.<br />
Selection based on Two-hand Tracking scores also increases the mean<br />
AFQT score for llK, but to a much lesser extent than either CO or<br />
the two predictors combined. For example, a 50% SR on Two-hand<br />
Tracking would produce a higher pass rate on Event 3 than a 20% SR<br />
using CO, yet it would lead to much less of an increase in mean<br />
AFQT scores. Therefore, Two-hand Tracking, compared to CO, is<br />
better able to minimize AFQT impact while improving outcomes for<br />
11H's.<br />
While the current research demonstrates a smaller depletion ot‘<br />
AFQT scores in remaining MOS when Two-hand Tracking is used as a<br />
classification test, future research must be conducted to evaluate<br />
the potential impact of this system on training or on-the-job<br />
260<br />
I<br />
/
performance criteria. Classification is most efficient when<br />
different skills are required for the jobs being filled. If twohand<br />
tracking is equally important for all Infantry positions,<br />
selecting TOW gunners with this test may not be appropriate and<br />
will result in a depletion of necessary tracking skills of recruits<br />
from the 11 MOS. follow-up work can examine this by collecting<br />
performance data from several Infantry MOS and examining the<br />
results of using a battery of tests to determine assignment.<br />
These results assume that training performance reflects onthe-job<br />
performance,. Some initial results in the field indicate<br />
that this is true, In addition, there is some limitation in the<br />
effectiveness of Two-hand Tracking as a predictor in this context,<br />
since the sample was already preselected based on a CO cutoff of<br />
85. More gains would most likely be found if psychomotor tests are<br />
given before recruits are assigned to even a family of MOS.<br />
However, the present results suggest that with only a slight<br />
modification of the present system, the addition of a psychomotor<br />
test can lead to improved selection without greatly depleting the<br />
quality of the recruits remaining for assignment in the remaining<br />
MOS.<br />
References<br />
Busciglio, H. H. (1990). The Incremental Validity of Snatial and<br />
Perceptual-Psychomotor Tests Relative to the Armed Services<br />
Vocational Attitude Battery. (AR1 Technical Report 883).<br />
Alexandria, VA: U.S. Army Research Institute.<br />
Busciglio, H. H., Silva, J., and Walker, C. (1990). The Potential<br />
of New Army Tests to Improve Job Performance. Paper<br />
presented at the 1990 Army Science Conference.<br />
Campbell, J. P. (1990). An overview of the Army selection and<br />
classification project (Project A). Personnel Psvcholoqy,<br />
43, 231-239.<br />
Silva, J. M. (1989). Usefulness of Spatial and Psychomotor <strong>Testing</strong><br />
for Predictins TOW and UCOFT Gunnerv Performance. (AR1<br />
Working Paper WP-RS-89-21). Alexandria, VA: U.S. Army<br />
Research Institute.<br />
261<br />
..
Backaround<br />
VALIDATION OF A NAVAL OFFICER SELECTION BOARD<br />
Captain J.P. Bradley<br />
Canadian Forces Personnel Applied Research Unit<br />
Willowdale, Ontario, Canada M2N 6B7<br />
Introduction<br />
In 1976, the Canadian Navy established the Naval Off,icer __<br />
Interview Board (NOIB) for the purpose of selecting applicants for<br />
the Maritime Surface and Sub-surface (MARS), and Maritime Engineer<br />
(MARE) officer occupations. There were two components to the NOIB,<br />
a selection interview,‘ conducted by a panel of senior naval<br />
officers, and an orientation program, consisting of tours of naval<br />
facilities and briefings by naval officers. The purpose of the<br />
orientation component was to ensure that candidates would be able<br />
to make an informed decision to join the Navy if selected by the<br />
NOIB.<br />
By 1983, the NOIB had not reduced attrition among MARS and<br />
MARE trainees; therefore, the Naval Officer Selection Board (NOSB)<br />
was developed. The NOSB retained the orientation component and<br />
interview of the former NOIB but incorporated other assessment<br />
instruments to achieve a multi-method approach to the assessment<br />
of naval officer potential. In 1989, the NOSB was renamed the<br />
Naval Officer Assessment Board (NOAB).<br />
To become qualified MARS officers, candidates must complete<br />
four phases of training; the Basic Officer Training Course (BOTC),<br />
required of all Canadian Forces (CF) officer applicants regardless<br />
of military occupation, and three phases of MARS occupation<br />
qualification training. An evaluation of the NOAB's ability to<br />
predict success on BOTC by Okros, Johnston, and Rodgers (1988)<br />
demonstrated that: (a) the NOAB predicted BOTC performance better<br />
than CF recruiting centre (CFRC) measures; (b) the optimal<br />
combination of predictors produced a multiple correlation of .40;<br />
and (c) the file review was identified as the best single NOAB<br />
predictor of BOTC with a correlation of .31. The present study<br />
complements the BOTC validation study and examines the ability of<br />
the NOAB to predict MARS occupation training success.<br />
Subjects<br />
Method<br />
Of the 743 MARS candidates who have attended the NOAB, the 95<br />
who have gone on to complete all phases of MARS training comprised<br />
the sample for this validation study. The subjects in this study<br />
were male. Female applicants have attended the NOAB since 1988,<br />
but none have completed MARS occupational training to date.<br />
262
Variables<br />
Criteria. Two measures of success on MARS occupation training<br />
were used: (a) grades on the third phase (MARS III); and (b) grades<br />
on the fourth. phase (MARS IV) of MARS training.<br />
Predictors. Operational predictors used by the NOAB to assess<br />
MARS,candidates included: (a) an interview; (b) a file review (an<br />
evaluation of the biographical data collected by the CFRCs); (c)<br />
a conducting officer's assessment; (d) performance in a practical<br />
leadership exercise; (e) performance in a leaderless group<br />
discussion; and (f) a NOAB merit score (a weighted combination of ..<br />
NOAB measures). Experimental predictors included: (a) the Problem<br />
Sensitivity Test (PST); and (b) the Passage Planning Test (PPT).<br />
CFRC predictors included.: (a) a military potential score provided<br />
by CFRC staff; and (b) a measure of tested learning ability based<br />
on the CF General Classification (GC) Test. The relations between<br />
BOTC performance and MARS training success were also evaluated.<br />
Predictina MARS III Performance<br />
Results<br />
Although Table 1 shows statistically significant correlations<br />
between MARS III results and three NOAB predictors -- file review,<br />
leadership stands, and the NOAB merit score -- multiple regression<br />
analyses revealed that the leadership stands did not provide any<br />
incremental prediction beyond that contributed by the file review<br />
(R = .20). In essence, the prediction afforded by the merit score<br />
is that provided by the file review. MARS III performance was<br />
unrelated to the following measures: (a) the interview; (b) the<br />
conducting officer's assessment; (c) the leaderless group<br />
discussion; (d) the CFRC military potential score; (e) tested<br />
learning ability; and (f) performance on BOTC.<br />
Predictinu MARS IV Performance<br />
As shown in Table 1, performance on MARS IV was related to the<br />
file review, NOAB merit score, BOTC performance, and MARS III<br />
performance. Of all the predictors, the file review accounted for<br />
the most variance in MARS IV performance. The NOAB merit score<br />
also correlated with MARS IV performance; however, the predictive<br />
contribution of the merit score was actually that provided,by the<br />
file review. Multiple regression analyses also showed that neither<br />
BOTC nor MARS III performance could account for variance of MARS<br />
IV beyond that already predicted by the file review (R = .28). The<br />
following variables were unrelated to MARS IV performance: (a) the<br />
interview; (b) the conducting officer's assessment; (c) the<br />
leaderless group discussion; (d) leadership stands; (e) military<br />
potential; and (f) tested learning ability.<br />
263
Table 1<br />
Correlation Matrix of Potential Predictors and Training Criteria<br />
1. co<br />
2. -1NT<br />
3. FR<br />
4. LS<br />
2: MS LGD<br />
7. GC<br />
a. MP<br />
9. BOTC<br />
10. M-3<br />
11. M-4<br />
12. PPT<br />
13. PST<br />
1 2 3 4 5 6 7 a 9 10 1112 13<br />
25<br />
ha<br />
.44<br />
.27<br />
. 63<br />
. la<br />
:27 62.21<br />
:74 31.25 75<br />
:36 3-c .55 .62<br />
. 08 .09 . 14 .25<br />
. 35 .64 .18 .17 .50<br />
.24 ,28 .22<br />
.20 -xv-- 20<br />
. 28 - - :20<br />
-----<br />
-----<br />
.22<br />
T-27<br />
.21 I-3-6<br />
--<br />
Note. Only correlations significant to the .05 level are reported<br />
in this table. Correlations between NOAB operational predictors<br />
are based on the population of NOAB candidates (n=743).<br />
Correlations between the NOAB predictors and training criteria are<br />
uncorrected correlations based on the sample of NOAB candidates<br />
attending MARS training (n=95). CO = conducting officer, INT =<br />
interview, FR = file review, LS = leadership stands, LGD =<br />
leaderless group discussion, MS = merit score, GC = general<br />
classification test, MP = military potential, BOTC = BOTC grade,<br />
M-3 = MARS III training results, M-4 = MARS IV training results,<br />
PPT = passage planning test, PST = problem sensitivity test.<br />
Experimental Predictors<br />
Because the PST and PPT have been incorporated into the NOAB<br />
only recently, there is not yet a sufficient number of candidates<br />
who have completed the two experimental tests at the NOAB and then<br />
gone on to complete MARS occupation training to evaluate the<br />
predictive validity of these tests. In the interim, the concurrent<br />
validity of the tests was evaluated by administering them to a<br />
small group of MARS candidates (n = 43 to 122) already in the<br />
training system. The results of this preliminary research indicate<br />
that the PPT is related to both MARS III (r = .21) and MARS IV<br />
performance (r = .30); however, the PST is not related to either<br />
MARS III or MARS IVtraining success. As shown in Table 1, the PPT<br />
is unrelated to the file review, suggesting the potential for<br />
contributing incremental criterion prediction beyond that provided<br />
by the file review,<br />
264
Psychometric Properties of NOAE! Predictors<br />
As a result of the inability of some NOAB predictors to<br />
provide criterion prediction, an evaluation of the psychometric<br />
properties of the NOAB exercises was conducted using two<br />
approaches.<br />
Factor analvtic approach. The 30 dimensions measured by the<br />
five NOAB exercises were submitted to a principal components<br />
analysis (varimax rotation) which produced a seven-factor solution<br />
shown in Table 2. As illustrated in Table 2, conceptually<br />
independent dimensions underlying the first four NOAB exercises .'<br />
loaded on exercise factors rather than on factors with conceptually<br />
similar dimensions. It appears that these four exercises are<br />
producing global measures of overall performance on each exercise<br />
and not measuring exercise dimensions, thereby raising doubt about<br />
the construct validity of the dimension ratings that comprise each<br />
of the exercises. Candidates' scores on these exercises may be<br />
more attributable to the procedures followed by the NOAB than the<br />
candidates' abilities with respect to the dimensions the four<br />
exercises are supposed to be measuring. .Table 2 shows that the<br />
file review was the only NOAB exercise that appeared as a<br />
multidimensional construct (it measures three different constructs<br />
-- personal background, military experience, and intelligence).<br />
In addition, the file review was the only NOAB measure that<br />
predicted MAPS training performance. The fact that the dimensions<br />
underlying the file review loaded on factors with other<br />
conceptually similar dimensions and did not simply load on a file<br />
review factor provides evidence of construct validity for the<br />
dimension ratings comprising the file review score and may account<br />
for the file review's success as an NOAB predictor.<br />
Multitrait-multimethod matrix annroach. To investigate<br />
further the notion that the interview, leadership stands,<br />
conducting officer's assessment and leaderless group discussions<br />
are actually producing one global measure for each exercise without<br />
regard to the dimensions contained in the exercise, the<br />
correlations of conceptually similar across-exercise dimensions<br />
(similar dimensions measured by different selection exercises) were<br />
evaluated using the method described by Campbell and Fiske (1959).<br />
As reported in Bradley (19901, the correlations between<br />
conceptually similar across-exercise correlations were lower than<br />
correlations between conceptually independent within-exercise<br />
dimensions, thereby lending further support to the notion that<br />
method variance is contaminating the measurement of NOAB exercise<br />
dimensions.<br />
Discussion<br />
The results of this validation research can be summarized as<br />
follows: (a) the file review is the only NOAB measure that predicts<br />
both training criteria; (b) the leadership stand assessment<br />
265
Table 2<br />
Factor Structure of NOAB Dimensional Measures<br />
Dimension/measure I II III IV v VI VII<br />
Interview:<br />
self-confidence<br />
.83<br />
presence/bearing<br />
78<br />
verbal expression<br />
:69<br />
enthusiasm<br />
82<br />
desire for MARS<br />
:79<br />
suitability for naval role .81<br />
ability to become naval officer .80<br />
Leadership task:<br />
initiative/decisiveness<br />
seeking/accepting advice<br />
preparation and planning<br />
communicating effectively<br />
directing others<br />
creating team performance<br />
Leaderless arouo discussion:<br />
persuasiveness/forcefulness _<br />
self-confidence/bearing<br />
communication skills<br />
leadership/maintaining the aim 1<br />
alertness -<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
.88<br />
.50<br />
.81<br />
.85<br />
.86<br />
.72<br />
Conducting officer's assessment:<br />
supporting/cooperating with others -<br />
effectiveness of leadership behaviour _<br />
individual effort and drive - -<br />
desire for MARS<br />
suitability for naval environment- 1<br />
File review:<br />
family background -<br />
military/pars-military experience- _<br />
military potential -<br />
employment history -<br />
educational achievement -<br />
tested learning ability -<br />
other activities/interests - -<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
- --<br />
-<br />
- -<br />
- -<br />
.84<br />
.80<br />
.80<br />
.79<br />
.67<br />
-<br />
-<br />
-<br />
-<br />
-<br />
- ------<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
.77 -<br />
.75 -<br />
.81 -<br />
.74 -<br />
.80 -<br />
Note. Only factor loadings greater than -44 are included in this<br />
table.<br />
. _ ~ .<br />
266<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
.44<br />
-<br />
.76<br />
.68<br />
.z5<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
.a;3<br />
.44<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
-<br />
.83<br />
c<br />
I<br />
I
predicts MARS III training success, but does not improve the<br />
prediction of MARS III beyond that already provided by the file<br />
review, and the leadership stand assessment does not predict MARS<br />
IV training success; (c) the NOAB merit score predicts MARS III and<br />
MARS IV performance, but all of this criteria prediction actually<br />
originates with the file review; (d) neither the interview,<br />
conducting officer, leaderless group discussion, nor the two CFRC<br />
measures -- tested learning ability and military potential -provide<br />
any prediction of MARS III or MARS IV training success; (e)<br />
the file review is the only NOAB measure that appears to be<br />
psychometrically sound; (f) the other four operational measures<br />
(interview, leadership stands, conducting officer, and leaderless _<br />
group discussion) require a psychometric overhaul (or replacement);<br />
and (g) of the two experimental NOAB measures, the PPT has the most<br />
potential for use as an operational NOAB predictor.<br />
Based on this study and the earlier BOTC validation by Okros<br />
et al. (1988), it has been recommended that the NOAB be retained<br />
as the assessment method for selecting MARS candidates and that<br />
efforts be made to improve the board's predictive efficacy by: (a)<br />
increasing the construct validity of exercise dimensions; (b)<br />
investigating the potential for applying situational interview<br />
methods and patterned behavioural interview techniques; (c)<br />
improving the predictive efficacy of the leadership stands; (d)<br />
evaluating the predictive efficacy of the General Classification<br />
(GC) test; and (e) developing new selection measures to replace<br />
the leaderless group discussion and conducting officers'<br />
assessments.<br />
References<br />
Campbell, D.T., & Fiske, D.W. (1959). Convergent and discriminant<br />
validation bY the multitrait-multimethod matrix.<br />
Psvcholoaical Bulletin, x, 81-105.<br />
Bradley, J.P. (1990). A validation studv on the Naval Officer<br />
Assessment Board's abilitv to predict MARS Officer training<br />
success. (Working Paper 90-7). Willowdale, Ontario: Canadian<br />
Forces Personnel Applied Research Unit.<br />
Okros A.C., Johnston V.W., and Rodgers, M.N. (1988). An evaluation<br />
of the effectiveness of the Naval Officer Selection Board as<br />
a Dredictor of success on the Basic Officer Traininu Course<br />
(Working Paper 88-l). Willowdale, Ontario: Canadian Forces<br />
Personnel Applied Research Unit.<br />
267
A Situational Judgment Test of Supervisory Knowledge in the U.S. Army1<br />
Mary Ann Hanson<br />
Personnel Decisions Research Institutes, Inc.<br />
Walter C. Borman<br />
The University of South Florida and Personnel Decisions Research Institutes, Inc.<br />
A situational judgment test involves presenting respondents with realistic job situations, usually<br />
described in writing, and asking them to respond in a multiple choice format regarding what should be<br />
done in each situation. Situational judgment tests have been developed by other researchers to predict<br />
job performance, especially for management and supervisory positions (e.g., Motowidlo, Carter, & Dunnette,<br />
1989; Tenopyr, 1969).<br />
This paper describes the development, field test, and preliminary construct validation of a situational<br />
judgment test designed to measure supervisory skill for non-commissioned officers (NCOs) in the U.S.<br />
Army. In contrast with most previous research the Situational Judgment Test (SJT) is a criterion measure<br />
of job performance. It is targeted at first line supervisors (ranking E-5), and is intended to evaluate<br />
the effectiveness of their judgments about what to do in difficult supervisory situations. Thus, the SJT is<br />
somewhat like a job knowledge test in the supervisory part of the job. Although no research is available<br />
on the use of situational judgment tests as criterion measures, there is research available on the usefulness<br />
of written simulations - which are similar to situational judgment tests - as measures of professional<br />
knowledge in fields such as law and medicine. Researchers have found that scores on written simulations<br />
differentiate between groups with differing levels of experience or training and are often related to other<br />
measures of professional knowledge or performance (see Smith, 1983 for a review).<br />
Development of the SJT<br />
Method<br />
Development of the SJT involved asking groups of soldiers similar to the target NCOs (i.e., E-4s and<br />
E-5s) to describe a large number of difficult but realistic situations that Army first-line supervisors face<br />
on their jobs. Once a large number of these situations had been generated, a wide variety of possible<br />
actions (Le., response alternatives) for each situation were gathered, and ratings of the effectiveness of<br />
each of these actions were collected from both experts (senior NCOs) and the target group (E-5 NCOs in<br />
beginning supervisory positions). These effectiveness ratings were used to select situations and response<br />
alternatives to be included in the SJT. The effectiveness ratings from the senior NCOs (i.e., experts) were<br />
also the basis for the development of SJT scoring procedures. Each of these steps is described in more<br />
detail below.<br />
Participants in the workshops to develop situations and response alternatives were 52 NCOs from<br />
nine different Army posts. Some were NCOs from the target sample and some supervised target NCOs<br />
(ranks ranging from E-5 to E-6). A variation of the critical incident technique (Flanagan, 1954) was used<br />
to collect situations to be used as the item stems. Workshop participants were asked to write descriptions<br />
of difficult supervisory situations that they or their peers had experienced as first-line supervisors in the<br />
Army. This resulted in a pool of about 300 situations. Response alternatives were primarily generated by<br />
presenting participants in later workshops with the situations that had been collected and asking them to<br />
write, in two or three sentences, what they would do to respond effectively in that situation. This resulted<br />
in about 15 possible responses for each situation, These responses were content analyzed and grouped to<br />
reduce redundancies. The final result was four to ten response alternatives for each situation, with a<br />
mean of about six response alternatives.<br />
1 This research was funded by the U.S. Army Research Institute for the Behavioral and Social Sciences,<br />
Contract No. MDA903-82-C-053 1. All statements expressed in this paper are those of the authors and do not<br />
necessarily reflect the official opinions or policies of the U.S. Army Research Institute or those of the Department<br />
of the Army.<br />
268<br />
._<br />
---r<br />
I
One-hundred and eighty of the most promising situations were then chosen based on their content<br />
(e.g., appropriately difficult, realistic, etc.) and the number of plausible response alternatives available.<br />
For each of these 180 situations retained, information concerning the effectiveness of the various response<br />
alternatives was collected from two groups, a group of expert NCOs and a group of the target<br />
population NC0 job incumbents. The expert NCOs were 90 students and instructors at the United States<br />
Army Sergeants Major Academy. These NCOs were among the highest ranking enlisted soldiers in the<br />
Any (rank of E-8 to E-9), and all had extensive experience as supervisors in the Army. The target<br />
NCOs were 344 second tour soldiers (rank of E-4 to E-5) who were participating in a field test of a group<br />
of job performance measures at several Army posts in the United States and Europe. For each SJT situation,<br />
these respondents were asked to rate the effectiveness of each response alternative on a seven point<br />
scale (1 = least and 7 = most effective). Because there were still 180 situations and time limitations, each<br />
soldier could only respond to a subset of the situations. This resulted in about 25 expert NC0 and 45<br />
incumbent NC0 responses per situation.<br />
Items (situations) for the field test version of the SIT and response alternatives for these items were<br />
then selected based on these data, The following criteria were used to select 35 of these situations and<br />
from 3-5 response alternatives for each situation: 1) the expert group had high agreement concerning the<br />
most effective response for the item; 2) the item was difficult for the incumbents (i.e., agreement was<br />
substantially lower than for the expert group); 3) the difference between the expert and the incumbent<br />
responses for each situation was judged to reflect an important aspect of supervisory knowledge; and 4)<br />
the content of the final group of situations was as representative as possible of the first-line supervisory<br />
job in the Army.<br />
Field Test of the SJT<br />
The field test of the SJT had three major objectives. The first objective was to explore different<br />
methods of scoring the SJT. The second objective was to examine and evaluate the psychometric properties<br />
of this instrument. The final objective was to obtain preliminary information concerning the consttuct<br />
validity of the SJT as a criterion measure of supervisory job knowledge.<br />
The SJT was administered as part of a larger data collection effort to a sample of 1049 NCOs (most<br />
were E-4s and E-5s) at a variety of posts in the United States and Europe. For each of the 35 SJT items,<br />
these soldiers were asked to place an “M” next to the response alternative they thought was the most<br />
effective and an “L” next to the response alternative they thought was the least effective.<br />
Scoring Procedures. Several different procedures for scoring the SJT were explored. The most<br />
straightforward was a simple number correct score. For each item, the response alternative that had been<br />
given the highest mean effectiveness rating by the experts (senior NCOs) was designated the “correct”<br />
answer. Respondents were then scored based on the number of items for which they indicated that this<br />
“correct” response alternative was the most effective. The second scoring procedure involved weighting<br />
each response alternative chosen by soldiers as the most effective by the mean effectiveness rating given<br />
to that response alternative by the expert group. This gives respondents more credit for choosing<br />
“wrong” answers that are relatively effective than for choosing wrong answers that are very ineffective.<br />
These item level effectiveness scores were then averaged to obtain an overall effectiveness score for each<br />
soldier, Averaging these item level scores instead of simply summing them placed respondents’ scores<br />
on the same 1 to 7 effectiveness scale as the experts’ ratings and ensured that respondents were not penalized<br />
for any missing data (up to 10% missing responses were allowed).<br />
Scoring procedures based on respondents’ choices for the least effective response to each situation<br />
were also explored. The ability to identify the least effective response alternatives might be seen as an<br />
indication of respondents’ ability to avoid these very ineffective responses or in effect to avoid “screwing<br />
up”, As with the choices for the most effective response, a simple number correct score was computed:<br />
the number of times each respondent correctly identified the response alternative that the experts rated<br />
the least effective. In order to differentiate this score from the number correct score based on choices for<br />
the most effective response, this score will be referred to as the L-Correct score, and the score based on<br />
choices for the most effective response (described previously) wit1 be referred to as the M-Correct score.<br />
Another score was computed by weighting respondents’ choices for the least effective response altema-<br />
269
tive by the mean effectiveness rating for that response, and then averaging these item level scores to<br />
obtain an overall effectiveness score based on choices for the least effective response alternative. This<br />
score will be referred to as L-Effectiveness, and the parallel score based on choices for the most effective<br />
responses (described previously) will be referred to as M-Effectiveness.<br />
Finally, a scoring procedure that involved combining the choices for the most and the least effective<br />
response alternative into one overall score was also explored. For each item, the mean effectiveness of<br />
the response alternative each soldier chose as the least effective was subtracted from the mean effectiveness<br />
of the response alternative they chose as the most effective. Because it is actually better if<br />
respondents indicate that less effective response alternatives are the least effective, this score can be seen<br />
as a sum or composite of the two effectiveness scores described previously (i.e., subtracting a negative<br />
number from a positive number is the same as adding the absolute values of the two numbers). These<br />
item level scores were then averaged together for each soldier to generate yet another score, and this<br />
score will be referred to as M-L Effectiveness.<br />
Descriotive Statistics. Descriptive statistics and internal consistency reliability estimates (RR-20)<br />
were computed for each of the five scoring procedures. Intercorrelations were also computed among the<br />
five scores generated by the five different scoring procedures.<br />
Preliminarv Information Concerning Construct Validity<br />
The data from this field test were also used to obtain preliminary information concerning the construct<br />
validity of the SJT as a criterion measure supervisory job knowledge. As mentioned previously, collecting<br />
the field test data for the SJT was a part of a larger data collection effort. Several other job performance<br />
measures were administered concurrently with the SJT, including job knowledge tests, a self-report<br />
administrative information survey, and supervisory simulation exercises (involving training a subordinate,<br />
disciplinary counseling, and personal counseling). Performance ratings were also collected from<br />
peers and supervisors using behavior-based rating scales. If the SJT is a valid measure of supervisory job<br />
knowledge, certain relationships would be expected with these other measures. For example, it should<br />
have at least moderate correlations with the scores on the supervisory simulations and performance ratings<br />
on supervisory dimensions. Correlations of SJT scores with several of these other job performance<br />
measures were examined.<br />
Another type of information that was used to assess the construct validity of the SJT was the extent<br />
to which the knowledges assessed by the SJT am learned on the job. If the SJT is a valid measure of job<br />
knowledge, soldiers who have more experience or training would be expected, on average, to obtain<br />
higher scores than soldiers with less experience or training. Self report information was collected from<br />
the soldiers in this field test sample concerning whether or not they had attended any supervisory training<br />
and how regularly they were required to supervise other soldiers, Mean SJT scores for soldiers with<br />
different levels of training and experience were also examined.<br />
Field Test Results<br />
Results<br />
Table 1 presents the mean score for each of the five scoring procedures. The maximum possible for<br />
the M-Correct scoring procedure is 35 (i.e., all 35 items answered correctly), but the mean score obtained<br />
by soldiers in this sample was only 16.25. The maximum score obtained was only 27. The mean number<br />
of least effective response alternatives correctly identified by this group was only 14.86. Clearly the SJT<br />
was difficult for this group of soldiers.<br />
Table 1 also presents the standard deviation for each of the five scoring procedures, and all of the<br />
scoring procedures resulted in a reasonable amount of variability in scores obtained by the soldiers in this<br />
sample. Table 1 also shows that the internal consistency reliabilities for all of these scoring procedures<br />
are quite high. The most reliable score is M-L Effectiveness, probably because this score contains more<br />
information than the other scores (i.e., choices for both the most and least effective response).<br />
.~ .,<br />
f---~..-wz.-.-.. _ .,._ _ 2 7 0
Table 2 presents the intercorrelations among scores obtained using the five different scoring ptocedures.<br />
These intercorrelations range from moderate to very high. Correlations between scores that are<br />
based on the same set of responses (e.g., M-Correct with M-Effectiveness) are higher than cortelations<br />
between scores that are based on different sets of responses (e.g., M-Correct with L-Correct). The correlations<br />
between L-Effectiveness and the other scores are negative, because lower L-Effectiveness scores<br />
are actua.lly better. The high (negative) correlation between M-Effectiveness and L-Effectiveness seems<br />
to indicate that these two scores measure similar or related constructs.<br />
Table 1<br />
Situational Judgment Test Means, Standard Deviations, and Internal Consistencies<br />
Scoring Procedure N Mean SD<br />
Internal<br />
Consistency<br />
Reliability’<br />
M-Correct 10253 5. 16.52 4.29 50<br />
M-Effectiveness lo253 4.91 .34 .68<br />
L-Correct 10073 14.86 3.86 .57<br />
L-Effectiveness loo73 3.542 .31 .68<br />
M-L Effectiveness loo73 1.36 .61 .75<br />
’ KR-20 ’<br />
2 Low scona indicate higher performance.<br />
3 Soldiers with mom than 10% incomplete or invalid data were anittad from these a&‘ses.<br />
Table 2<br />
Situational Judgment Test Score Intercorrelations for the Five Scoring Procedures<br />
M-Eff. L-Correct L-Eff. M - L Eff.<br />
M-Correct .94 52 44 .86<br />
M-Eff. -me .59 -.70 .93<br />
L-Correct m e- -*- -.86 .78<br />
L-Eff. m-e v-m --_ -.92<br />
M - L Eff. -w- we- --- w-m<br />
Note. Sample sizes range from 1007 to 1025.<br />
271<br />
.._ II<br />
..----... ._.___._ I-<br />
.
-- .- ..-- -_*- --.._ --l_*y -r<br />
The M-Correct and L-Correct scores have less desirable psychometric properties than the scores<br />
obtained using the other three scoring procedures. In addition, these two scores contain information that<br />
is very similar to the information provided by the M-Effectiveness and L-Effectiveness scores respectively,<br />
because they are based on the same sets of responses. Thus, results reported for the remainder of the<br />
analyses will not include these two scores.<br />
Preliminarv Information Concerninp Construct Validity<br />
Table 3 shows the correlations of the three remaining SJT scores with scores from the other job<br />
performance measums. The SJT SCOIW cormlate moderately with a composite of scores on the three<br />
superviscry simulations. The SJT scores also have moderate correlations with the performance rating<br />
composite called Leading/Supervising. Correlations with the other performance rating composites are<br />
slightly lower. Correlations with scores on the job knowledge tests are quite high, but this is not surprising<br />
in view of the fact that these are also paper-and-pencil tests. Finally, the SJT scores have moderate<br />
correlations with a variable called “grade deviation score”, which is essentially promotion rate. Promotion<br />
rate might be seen as an overall measure of success as a soldier.<br />
Table 3<br />
Correlations Between SJT Scores and Other Job Perfomlance Measures<br />
Performance Rating Composites3<br />
Effort/ Grade* Supervisory<br />
Leading/ Technical Personal <strong>Military</strong> Job Deviation Simulation<br />
Supervising Performance Discipline %uing Knowledge’ Score Composite4<br />
M-Eff. .24 .21 .20 .ll .40 .20 .20<br />
L-Eff. ~18 -.17 -.15 -.06 -.34 -.20 -.I6<br />
M-L Eff. .22 .21 .18 .lO .40 .22 .20<br />
___- -....-. ___.--<br />
1 Weighted mean across nine MOS; sample size per MOS ranges from 38 to 146.<br />
2 This variable is essentially promotion rate; sample sizes range from 849 to 919.<br />
3 Based on pooled peer and supervisor ratings. Sample sizes range fran 855 to 907: a con-elation of .07 is siwifiant at the .05 level.<br />
4 Composite of scores ftwft three. simulations: personal counseling, disciplinary counseling, and training. Sample<br />
aizcs range from 873 to 909, a correlation of .07 is significant at the .OS level.<br />
Table 4 shows the mean SJT scores of soldiers who reported various levels of supervisory training.<br />
Soldiers who had attended no supervisory school at all scored almost a half a standard deviation lower<br />
than those who had attended one or mom supervisory schools. One potential confound in this compati.son<br />
is that the opportunity to attend supervisory schools varies, and decisions concerning which soldiers<br />
are given the opportunity to attend these schools may be influenced by their effectiveness as soldiers or<br />
as supervisors. As a result, it is possible that these mean SJT score differences were obtained because the<br />
more effective soldiers were given the opportunity to attend supervisory training. However, regardless of<br />
whether these differences are the result of differential opporturities or training in the relevant supervisory<br />
skills, these mean score differences provide some support for the construct validity of the SJT as a measure<br />
of supervisory skill.<br />
Mean SJT scores are also reported on Table 4 for subgroups of soldiers identified by how frequently<br />
they reported supervising other soldiers. For all three SJT scoring procedures the expected pattern was<br />
found; soldiers who reported that they supervised other soldiers more frequently obtained better SJT<br />
sCOms. The largest difference is for the L-Effectiveness score. Soldiers who reported that they regularly<br />
supervise other soldiers obtained L-Effectiveness scores almost half a standard deviation better (i.e.,<br />
272<br />
I
lower) than those of soldiers who reported that they never supervise other soldiers. These results for<br />
supervisory experience are slightly different than those obtained for supervisory training, where the<br />
largest mean differences were found for the M-Effectiveness score. Perhaps this is because supervisory<br />
experience sometimes involves making mistakes and leaming from the consequences of these mistakes<br />
(i.e., learning to identify ineffective responses), but supervisory training is more likely to focus on the<br />
identification of effective supervisory responses.<br />
Table 4<br />
Mean Situational Judgment Test Scores for Soldiers With Different Levels of<br />
Supervisory Training and Experience<br />
N M-Eff. L-Eff. M-L Eff.<br />
Attendedone or more supervisory schools 560-603 4.97 3.50 1.47<br />
-7<br />
Attendedno supervisory school 327-371 4.81 3.62 1.20<br />
How often required to supervise other soldiers:<br />
Never 87199 4.87 3.63 1.23<br />
Sometimes fill in for regular supervisor 294-327 4.86 3.58 1.29<br />
Often fili in for regular supervisor 125-135 4.90 3.53 1.38<br />
Regularly supervise other soldiers 391415 4.96 3.49 1.47<br />
Conclusions<br />
The results of the field test of the SJT indicate that this test is appropriately difficult for the target<br />
sample. The five scoring procedures that were explored all resulted in scores with a reasonable amount<br />
of variance among the soldiers in this sample. Internal consistency reliabilities were also quite high.<br />
Based on all of the psychometric properties examined, the most promising score appears to be M-L Effectiveness,<br />
which has an internal consistency reliability of .75.<br />
The preliminary information obtained concerning the construct validity of the SJT provides evidence<br />
that the SJT is a valid measure of supervisory job knowledge. The correlations of SJT scores with the<br />
other job performance measures provide some support for the construct validity of the SJT. However, the<br />
SJT also has moderate correlations with several measures of technical performance and with promotion<br />
rate. Mean SJT scores for soldiers with different levels of supervisory experience and training indicate<br />
that the knowledge or skill measured by the SJT is, to some extent, learned on the job and in supervisory<br />
training.<br />
REFERENCES<br />
Motowidlo, S. J., Dunnette, M. D., & Carter, Cl. W. (in press). An alternative selection procedure: The<br />
low fidelity simulation. Journal of Applied Psychology.<br />
Smith, I. L. (1983). Use of written simulations in credentialing programs. Professional Practice of<br />
Psychology, 4.2 l-50.<br />
Tenopyr, M. L. (1969). The comparative validity of selected leadership scales relative to success in<br />
production management. Personnel Psychology, 22,77-85.<br />
273
Context Effects on Multiple-Choice Test Performance<br />
Lawrence 5. Buck*<br />
Planning Research Corporation, System Services<br />
Introduction<br />
It has long been a tenet of test construction theory and practice that test items<br />
measuring the same content or behavioral objectives should be grou,ped within a<br />
test. For example, Tinkelman (1971) stated:<br />
If items measuring different content objectives or different behavioral<br />
objectives are included in the same test, consideration should be given to<br />
grouping the items by type. Usually the continuity of thought that such<br />
grouping allows on the part of the examinee is found to enhance the<br />
quality of his/her performance.<br />
Other rationales for grouping similar items include such viewpoints as: test anxiety<br />
may be reduced by grouping items on a test, examinees will concentrate better if<br />
they do not jump from subject to subject, and examinees might glean information<br />
from certain questions in a set of questions that will facilitate the answering of other<br />
questions in the set (Gohmann & Spector, 1989).<br />
A majority of the studies addressing item positioning have centered on the effects of<br />
ordering questions by difficulty level rather than by content. (For a representative<br />
sample, see: Hodson, 1984; Sax & Cromack, 1966; Leary & Dorans, 1985; and Plake,<br />
1980.) Numerous other studies, primarily in the educational arena, have addressed<br />
the effects of randomizing items in tests rather than presenting the items in the<br />
order that the information is covered in the classroom or in the textbook(s). (For a<br />
representative sample, see: Gohmann & Spector, 1989; Taub & Bell, 1975; and<br />
Bresnock, Graves, & White, 1989).<br />
The primary focus of this study is the effect on part and total test performance of<br />
randomizing the items on multiple-choice tests normally constructed with the items<br />
grouped by content areas or domains. A secondary objective was to evaluate the<br />
effects on the individual item statistics. The items in the tests in question are<br />
normally presented from easiest to most difficult within each domain.<br />
Two tests were selected for this study, Rigging and Weight <strong>Testing</strong> (BM-0110) and<br />
Outside Electrical (EM-4613). These tests are part of a testing program which<br />
develops, administers, and maintains Journeyman Navy Enlisted Classification (JNEC)<br />
exams for the Navy’s Intermediate Maintenance Activity (IMA) community. The tests<br />
are part of the qualification process for special classification codes. Both the BM-<br />
0110 and EM-4613 examinations consist of 120, four-choice, multiple-choice test<br />
questions spread across six domains as indicated in Table I below.<br />
Table I<br />
Test Item-Domain Breakdown<br />
Domains<br />
Test # of Items 1 2 3 4 5 6<br />
BM-0110 120 18 30 14 12 30 16<br />
EM-461 3 120 10 6 14 55 22 13<br />
*The author wishes to thank Norma Molina-laggard for her able assistance with the data analyses<br />
274
For each administration, the tests were generated with a total test and each domain<br />
mean difficulty index (p-value) of -60. The tests are essentially power tests with<br />
three hours allowed. The cutting score for each test is based on 62.5% of the<br />
number of test questions (a score of 75) or the group mean, whichever is higher. The<br />
cutting score was 75 for each of the tests for each administration. The test items<br />
were selected in accordance with the following parameters, p-values between -25<br />
and .90 and biserials between .15 and .99.<br />
The tests are administered twice yearly, in the spring and fall, to enlisted Navy<br />
personnel in pay grades E-5 through E-9, with a minimum of nine months experience<br />
in an IMA activity. BM-0110 was developed in the summer of 1987 and placed into<br />
operational use in the fall of 1987. EM-4613 was developed in the fall of 1987 and<br />
placed into operational use in the spring of 1988. All tests in this program *were<br />
developed by subject-matter experts from each trade under the tutelage of a testing<br />
specialist. All of the tests are computer generated by an automated test processing<br />
system (TPS) that includes item banking, scoring, and analysis and updating of all<br />
test and item data.<br />
Procedure<br />
Three different administrations -- Spring 1989 (l-89), Fall 1989 (2-89), and Spring<br />
1990 (l-90) --were used for this study for both the BM-0110 and EM-4613 tests. Both<br />
the l-89 and I-90 tests were constructed under normal procedures, i.e., with items<br />
grouped by domain and presented from easiest to most difficult within each<br />
domain. For the 2-89 administrations, the test items were randomized without<br />
regard for content area or difficulty level.<br />
The items for each administration were generated by the TPS from the total item<br />
pool available for each test and therefore the items were not identical across<br />
administrations. Table II presents the number of items common to each pair of test<br />
administrations.<br />
Table II<br />
Common Items Between Administrations<br />
1-89 - 2-89 1-89 - l-90 2-89 - l-90<br />
BM-0110<br />
I<br />
71 77<br />
I<br />
89<br />
EM-461 3 66 67<br />
I<br />
67<br />
Under ideal conditions, the research design would have used the same items for each<br />
administration and both forms of the test would have been administered at the<br />
same time. However, due to a number of factors including fairly small N’s and<br />
numerous repeat candidates from one test administration to another, the ideal<br />
design was not possible. The test populations do tend to be quite stable from one<br />
administration to another, however, in terms of trade experience and numbers from<br />
each paygrade.<br />
The test results and item statistics from each administration for each test were<br />
compared with the other administrations from four different perspectives -- total<br />
test results, part test scores, common item comparisons, and individual item<br />
statistics. As previously stated, the objectives were to determine if randomizing the<br />
items would have any effect on total test performance, part (domain) test<br />
performance, and individual item statistics. A variety of statistical procedures were<br />
employed to analyze the data including Z-tests, two-tailed t-tests, and ANOVAs.<br />
275
Results<br />
Total Test Performance. With respect to total test performance, the test results<br />
were quite consistent from administration to administration as reflected in Table III.<br />
The 2-89 administration seems to be a little easier for both the BM-0110 and EM-<br />
4613 tests although the differences are small. The test reliabilities also remained<br />
reasonably consistent across test administrations.<br />
Table III<br />
Summary Test Statistics<br />
BM-0110 EM-461 3<br />
A Z-test was applied to the mean test scores between paired comparisons, i.e., l-89<br />
with 2-89, etc., and all results were nonsignificant at the .OS level. In this respect, we<br />
were unable to reject the null hypothesis for all comparisons. An ANOVA was also<br />
calculated across each of the three administrations and the results were not<br />
significant at the .05 level for either the BM-0110 (F[2,359] = -1.183) or EM-4613<br />
(F[2,359] = .028).<br />
Table IV below, presents another way of comparing the overall test results as the<br />
passing rates by paygrade are presented for each administration. The passing rates<br />
are reasonably consistent across test administrations with somewhat higher<br />
percentages passing for the 2-89 test. These results are not inconsistent with the<br />
test results from other tests in the program where some fluctuations occur but the<br />
passing rates remain fairly consistent for each paygrade.<br />
Table IV<br />
Test Results by Paygrade<br />
BM-0110<br />
1-89 Passing 2-89 Passing l-90 Passing<br />
Paygrade N N Oh N N % N N Oh<br />
E-5 34 13 38 26 11 42 21 7 33<br />
E-6 9 5 56 10 6 60 12 5 42<br />
E-7 5 2 40 2 1 50 1 1 100<br />
E-8 & E-9 0 0 0<br />
TOTALS 48 20 42 38 18 47 34 13 38<br />
276
Table IV cont.<br />
EM-46 13<br />
Paygrade 1 N 1 N 1 % 1 N I N I % 1 N 1 N 1 %<br />
E-5 43 16 37 42 9 21 25 6 24<br />
E-6 44 18 41 42 22 52 31 12 39<br />
E-7 11 5 45 8 7 88 4 2 50<br />
E-8 8 E-9 1 0 0 1 1 100 2 2 100<br />
TOTALS 99 39 37 93 39 42 62 22 35<br />
Part Test Performance. In addition to evaluating any effects on total test<br />
performance of randomizing the items it was also considered prudent to consider<br />
any effects on domain performance. As indicated in Table V below, the results are<br />
similar to those reported in Table Ill for total test performance. That is, the average<br />
domain scores are quite consistent across test administrations with the 2-89<br />
administration being somewhat easier for almost all domains across the three<br />
administrations.<br />
Table V<br />
Average Domain Scores<br />
BM-0110 EM-461 3<br />
Randomized complete block design ANOVAs were computed for the domain scores<br />
across the three administrations of each test and the results were not significant for<br />
either the BM-0110 or EM-4613, (F[2,17] = 2.36) and (F[2,17] = .015) respectively.<br />
Common Item Comparisons. Since it was not possible to use the same items in total<br />
for each of the three test administrations, it was also necessary to evaluate the<br />
effect, if any, on the subset of common items for each paired comparison. A twotailed<br />
t-test was used to analyze the items common to each pair of administrations<br />
and all results for both the BM-0110 and EM-4613 were nonsignificant at the .05<br />
level, In addition, ANOVAs were calculated for each of the three administrations of<br />
the BM-0110 and EM-4613 tests and the results failed to reveal any significant<br />
differences at the .05 level of significance, (F[2,74] = .044) and (F[2,1461 = -720)<br />
respectively.<br />
Individual Item Statistics. The issue of any effect on item statistics of varying the<br />
item’s position was investigated by comparing the item difficulty indexes (p-values)<br />
of common items in each pair of test administrations as well as the item<br />
277
discrimination values (biserials). That is, does presenting the items in other than<br />
their normal domain and without regard to difficulty level, have an effect on the<br />
items’ statistics? Table VI presents the average p-value changes for the common<br />
items between the paired test administrations. The first test in each pair served as<br />
the base item position for comparative purposes. As indicated in Table VI, the<br />
avera e item p-values showed a somewhat greater tendency to increase (items<br />
easier 7than<br />
to decrease although the differences are small. The average overall<br />
change in the items’ p-values remained quite consistent across the three pairs of test<br />
administrations for both the BM-0110 and the EM-4613.<br />
.’<br />
Paired<br />
Administrations<br />
Table VI<br />
Comparison of Common Items’ P-Values<br />
Average P-Value Change By Relative Position<br />
EM461 3<br />
Average Overall Average Overall Average Overall<br />
Change Increase* Decrease*<br />
l/89 with l/90 .095 .084 .075 .107 .098<br />
2/89 with l/89 .091 .105 .079 .092 .121<br />
2/89 with l/90 .099 .095 .117 .087 .I12<br />
BMOllO<br />
l/89 with l/90<br />
2/89 with l/89<br />
.<br />
.222<br />
I<br />
.199 .055 1.199 .071<br />
1.246 .091 I.198 .068<br />
2/89 with l/90 .213 .227 .087 1.190 .082<br />
I<br />
I<br />
*The first column represents the average for the first test of each pair;<br />
the second column represents the second test.<br />
With respect to the items’ biserials, Table VII presents the average biserials for the<br />
common items in each pair of test administrations As was the case with the pvalues,<br />
the average biserials were quite consistent between paired test<br />
administrations with the differences quite small.<br />
Table VII<br />
Average Biserials for Common Items<br />
of Paired Test Administrations<br />
EM-4613 BM-0110<br />
l/89 with 2/89 .29 .25 .30 .34<br />
l/89 with l/90 I.32 -25 1.32 .28<br />
2/89 with l/90 .21<br />
I<br />
278<br />
.22<br />
.32 .26<br />
I
Discussion<br />
This study failed to show that randomizing the items in a multiple-choice test would<br />
have a deleterious effect on examinees with respect to test performance. If<br />
anything, the randomized tests were somewhat easier although the differences<br />
were small and were not significant. The effects on item statistics were minimal as<br />
the item difficulty indices (p-values) showed no clear trend of increasing or<br />
decreasing when comparing randomized vs. nonrandomized tests, and the item<br />
discrimination values (biserials) remained quite consistent across test<br />
administrations.<br />
Within the confines of this study, it was not possible to assess examinee reaction to<br />
the different test formats to discern whether the different item presentations were<br />
perceived differently by the examinees. Nor was it possible to determine whether<br />
examinees answer questions in order or tend to skip around and group like<br />
questions even thou h they are not grouped on the test. Studies by Tuck (1978) and<br />
Allison and Thomas 91986) have suggested that few examinees answer questions in<br />
order and that there is a tendency to group similar items.<br />
The study supports the stability of item statistics across different test formats and<br />
administrations and the lack of any significant contextual or item position effects on<br />
test performance. The implications of these findings are, to preclude the possibility<br />
of cheating, randomized versions of the same tests could be administered without<br />
fear of creating an unfair advantage or disadvantage.<br />
References<br />
Allison, D.E., and D.C. Thomas. 1986. Item-difficulty sequence in achievement<br />
examinations: Examinees’ preferences and test taking strategies. Psychological<br />
Review 59,867-70.<br />
Bresnock, A.E., P.E. Graves, and N. White. 1989. Multiple-choice testing: Questions<br />
and response position. Journal of Economic Education, (Summer), 239-245.<br />
Gohmann, SF., and L.C. Spector. 1989. Test scrambling and student performance,.<br />
Journal of Economic Education, (Summer), 235-238.<br />
Hodson, D. 1984. The effect of changes in item sequence on student performance<br />
in a multiple-choice chemistry test. Journal of Research in Science Teaching, Vol.<br />
21, N. 5,489-495.<br />
Leary, L.F., and N.J. Dorans. 1985. Implications for altering the context in which test<br />
items appear: A historical perspective on an immediate concern. Review of<br />
Educational Research 55, (Fall), 387-413.<br />
Plake, B.S. 1980. Item arrangement and knowledge of arrangement on test scores.<br />
Journal of Experimental Education 49, (Fall), 56-58.<br />
Sax, G., and T.A. Cromack. 1966. The effects of various forms of item arrangements<br />
on test performance. Journal of Educational Measurement 3, 309-311.<br />
Taub, A.J., and E.B. Bell. 1975. A bias in scores on multiple-form exams. Journal of<br />
Economic Education 7, (Fall), 58-59.<br />
Tinkelman, S.N. Planning the objective test. In R.L. Thorndike (Ed.), Educational<br />
Measurement (2nd. ed.). Washington, D.C.: American Council on Education,<br />
1971.<br />
Tuck, J.P. 1978. Examinee’s control of item difficulty sequence. Psychological<br />
Reports 42, 1109fO.<br />
279
ABSTRACT<br />
DIETARY EFFECTS ON TEST PERFORMANCE<br />
Charles A. Salter<br />
Laurie S. Lester<br />
Susan M. Luther<br />
Theresa A. Luisi<br />
U.S. Army Natick Research, Development & Engineering Center<br />
Natick, MA<br />
Previous research suggests that meal composition may affect performance<br />
on the automated Memory and Search Task (MAST). The purpose of this study<br />
was to determine if lunch protein or carbohydrate would interact with caffeine to<br />
affect performance and mood as assessed by the MAST, the Automated Portable<br />
Test System (APTS), and visual-analogue mood scales. Male subjects were<br />
assigned either to a protein lunch (5 g/kg turkey breast) or a carbohydrate lunch (5<br />
g/kg sorbet) group so that normal caffeine intakes were equivalent. Within each<br />
group, subjects rotated through two caffeine conditions in a counterbalanced order,<br />
drinking two cups of either caffeinated or decaffeinated coffee with lunch. Caffeine<br />
use was prohibited at other times during the study. The APTS was consistent with<br />
the MAST in showing no performance effects of protein and caffeine, though<br />
protein did correlate with some self-reported moods. The protein group reported<br />
increased hunger over time (p=.OO2) and felt less dejected (p=.O4) than did the<br />
carbohydrate group, while caffeine produced no significant effects. Greater<br />
carbohydrate intake was associated with lower MAST scores, though the direction<br />
of causation is unclear, and it had no effect on the APTS. It is concluded that<br />
performance on the MAST and APTS are relatively unaffected by dietary<br />
differences of this type and magnitude.<br />
INTRODUCTION<br />
The automated Memory and Search Task (MAST) uses a hand-held<br />
computer to present stimuli consisting of randomized sequences of 16 alphabetic<br />
characters each along with randomized targets of 2, 4, or 6 letters that the subject<br />
identifies as being present within or absent from each stimulus (Salter et al, 1988).<br />
The first studies with the MAST used it as a tool to assess dietary effects on<br />
performance (Salter et al, 1988). These studies indicated a significant post-lunch<br />
slump in MAST scores followed by recovery later in the afternoon. Salter and Rock<br />
(1989) did not find a post-meal decrement in performance on the MAST when<br />
slightly different times were used for testing. This latter study did find, however, that<br />
the more protein subjects ate at lunch, the better they scored on the MAST<br />
280
afterwards. The purpose of the current study was to determine whether such<br />
nutrients/food ingredients as protein, carbohydrate, and caffeine would affect<br />
performance not only on the MAST but also on several subtests (pattern<br />
recognition, reaction time, symbolic reasoning, and hand-tapping) of the<br />
Automated Portable Test System or APTS (Bittner et al, 1985) and visual-analogue<br />
mood scales.<br />
Previous research has demonstrated that protein can enhance performance<br />
because it contains tyrosine, the amino acid precursor to norepinephrine, which<br />
helps the body function in states of arousal or stress (Lieberman et al, 1984).. On<br />
the other hand, carbohydrate leads to insulin release which helps clear the blood<br />
of amino acids except tryptophan, resulting in greater passage of this serotonin<br />
precursor into the brain. Serotonin can induce a drowsy quiescent state capable of<br />
suppressing performance (Lieberman et al, 1982/83). Caffeine is a commonly<br />
used performance enhancer demonstrated to increase alertness and vigilance<br />
(Sawyer, Julia, and Turin, 1982). We particularly wanted to test whether caffeine<br />
would interact with either protein or carbohydrate in affecting performance.<br />
METHOD<br />
The subjects were military and civilian employees, males only, at the US<br />
Army Natick Research, Development & Engineering Center. All potential subjects<br />
were screened for previous caffeine use. Only those who normally consumed<br />
between 2 and 4 caffeinated beverages (coffee, tea, or soda) per day were<br />
retained. The subjects then filled out a questionnaire regarding their typical<br />
caffeine use, from which their total daily caffeine ingestion was estimated. The<br />
subjects were then split into a protein-lunch group (16 subjects) and a<br />
carbohydrate-lunch group (18 subjects) so that the average daily caffeine intake<br />
was equivalent in both groups.<br />
On the first day of testing, the subjects were trained in the use of the<br />
automated MAST, the APTS (using the pattern recognition, reaction time, symbolic<br />
reasoning, and hand-tapping subtests), and visual-analogue mood scales<br />
(indicating on a IOO-mm line how relatively tense, hungry, dejected, tired, angry,<br />
vigorous, and confused they felt). On the following two days of testing, all subjects<br />
were fed the same standard, mixed-nutrient breakfast at 0730 hours, tested at 1000<br />
hrs, fed the experimental lunch at 1130, given a math exercise immediately after,<br />
then tested shortly after noon and finally at 1430. The timed math exercise (30<br />
minutes maximum) was used because previous studies (Morse et al, 1989) found<br />
that it served as an effective stressor to mobilize norepinephrine use. The protein<br />
lunch group was served 5 g/kg turkey breast, while the carbohydrate lunch group<br />
was provided with 5 g/kg sorbet. These two foods were chosen because previous<br />
research had demonstrated them capable of having behavioral effects (Spring,<br />
Lieberman, Swope, and Garfield, 1986). Subjects were instructed to eat as much<br />
of their test meals as they could, but there was a wide variation in the proportion<br />
281
consumed. Within each group, subjects rotated through two caffeine conditions in<br />
a counterbalanced order, drinking two cups of either caffeinated or decaffeinated<br />
coffee with lunch. Caffeine use was prohibited at other times during the study.<br />
RESULTS AND DISCUSSION<br />
Analysis of variance tests indicated no significant differences on MAST<br />
performance as a function of group (protein vs. carbohydrate), caffeine (or its<br />
absence), or the interaction of group and caffeine. Salter and Rock (1989) similarly<br />
found no major group effects due to nutrient type, but did find significant<br />
correlations between the proportion of protein actually consumed and<br />
performance. In Table 1 can be seen the correlations in the current study between<br />
the percent of nutrient consumed and MAST performance. Whereas Salter and<br />
Rock (1989) found a positive correlation for protein, the current study found a<br />
negative correlation for carbohydrate. Previous studies have found both types of<br />
effects (Lieberman et al, 1984). However, consideration of the time factor indicates<br />
that the significant negative correlation occurred even in the morning before<br />
Table 1<br />
Correlations Between Percent of Test Food Consumed<br />
and automated Memory and Search Task (MAST) scores<br />
Time.<br />
- Task Level:<br />
II 000 hrs)<br />
2<br />
(1200 hrs)<br />
3<br />
(1430 hrs)<br />
* p-E.05<br />
** PC.01<br />
*** PC.001<br />
Z-character target<br />
4-character target<br />
6-character target<br />
2-character target<br />
4-character target<br />
6-character target<br />
2character target<br />
4-character target<br />
6-character target<br />
Protein Carbohydrate<br />
/N=l6) /NJ 81<br />
-.28 -.58*<br />
-.36 -.58*<br />
-.38 -.58*<br />
-.03 -.65**<br />
-.07 -.37<br />
-.13 -.66**<br />
-.06 -.79***<br />
-.02 -.x5*<br />
-.I5 -.47*
consuming the carbohydrate. This study, then, is a clear example of correlation not<br />
implying causation. If anything, it appears that people who score lower on the<br />
MAST are inclined to eat more carbohydrate rather than the other way around.<br />
The proportion of test meals consumed was also correlated with APTS<br />
performance, and there were no significant effects on the tasks of pattern<br />
recognition (stating whether two patterns of asterisks were the same or different),<br />
reaction time (pressing the number of the four boxes which lights up), or symbolic<br />
reasoning (indicating whether each of several statements is true, for example, “A is<br />
in front of B--BA”). Several trials of the hand-tapping test data were not recorded<br />
properly and this variable could not be analyzed. See Table 2.<br />
.<br />
Table 2<br />
Correlations Between Percent of Test Food Consumed<br />
and Automated Performance Test System (APTS) scores<br />
Test: Time.<br />
A<br />
Protein Carbohydrate<br />
/N=l6) /N=l8\<br />
Pattern 1000 hrs -.05 .-.40<br />
Recognition 1200 hrs .41 -.33<br />
1430 hrs .14 -.40<br />
Reaction 1000 hrs -.22 -.27<br />
Time 1200 hrs -.05 -.23<br />
1430 hrs -.03 -.37<br />
Symbolic 1000 hrs -.25 -.25<br />
Reasoning 1200 hrs -.07 -.44<br />
1430 hrs .21 -.27<br />
Correlations between percent consumption and various moods, however,<br />
were more often significant. Table 3 has the results, including just the moods with<br />
significant effects. Moods like dejection, fatigue, and vigor are not included in this<br />
table because none of the correlations were significant. Protein consumption was<br />
positively related to tension and anger, a finding confirming earlier reports<br />
(Banderet ,et al, 1986). However, the association held also in the morning before<br />
the protein meal, again damaging the case for causation. In addition to the<br />
283<br />
.
Mood:<br />
Tense<br />
I--- ----<br />
Table 3<br />
Correlations Between Percent of Test Food Consumed<br />
and Visual-Analogue Mood scores<br />
Time.<br />
-<br />
1000 hrs<br />
1200 hrs<br />
1430 hrs<br />
Hungry 1000 hrs<br />
1200 hrs<br />
1430 hrs<br />
AwrY 1000 hrs<br />
1200 hrs<br />
1430 hrs<br />
Confused 1000 hrs<br />
1200 hrs<br />
1430 hrs<br />
* p
REFERENCES<br />
Banderet, L. E., Lieberman, l-l. R., Francesconi, R. P., Shukitt, 6. L., Goldman, R. F.,<br />
Schnakenberg, D. D., Rauch, T. M., Rock, P. B., and Meadors, G. F. (1986).<br />
. Development of a paradigm to assess nutritive and biochemical substances<br />
in humans: A preliminary report on the effects of tyrosine upon altitude- and<br />
cold-induced stress responses, Presented at and published as Proceedings<br />
of the AGARD Aerospace Medical Panels Symposium, Biochemical<br />
Enhancement of Performance, Lisbon, Portugal, 30 Sep-2 Ott, 1986.<br />
Bittner, A. C., Smith, M. G., Kennedy, R. S., Staley, C. F., and Harbeson, M. M.<br />
(1985). Automated Portable Test (APT) System: Overview and prospects.<br />
Behavior Research Methods, Instruments. & Computers, 17, 217-221.<br />
Lieberman, H. R., Corkin, S., Spring, 8. J., Garfield, G. S., Growdon, J. H., and<br />
Wurtman, R. J. (1984). The effects of tryptophan and tyrosine on human<br />
mood and performance. Psvchopharmacoloav Bulletin, 20, 595598.<br />
Lieberman, H. R., Corkin, S., Spring, B. J., Growdon, J. H., and Wurtman, R. J.<br />
(1982I83). Mood, performance, and pain sensitivity: Changes induced by<br />
food constituents. Journal of Psvchiatric Research, 17, 135-l 45. ,<br />
Morse, D. R., Schacterle, G. R., Furst, L, Zaydenberg, M., and Pollack, R. L. (1989).<br />
Oral digestion of a complex- carbohydrate cereal: effects of stress and<br />
relaxation on physiological and salivary measures. American Journal of<br />
Clinical Nutrition, 49, 97-l 05.<br />
Salter, C. A., Lester, L. S., Dragsbaek, H., Popper, R. D., and Hirsch, E. (1988). A<br />
fully automated memory and search task. In A. C. F. Gilbert (Ed.),<br />
Proceedinas of the 30th Annual Conference of the Militarv <strong>Testing</strong><br />
<strong>Association</strong>. Arlington, Virginia: <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong>. Pp. 515<br />
520.<br />
Salter, C. A., and Rock, K. L. (1989). Using the memory and search task to assess<br />
dietary effects. Proceedinas of the 31 st Annual Conference of the Militarv<br />
Testina <strong>Association</strong>. San Antonio, Texas: <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong>. Pp.<br />
701-706.<br />
Sawyer, D. A., Julia, H. L., and Turin, A. C. (1982). Caffeine and human behavior:<br />
Arousal, anxiety, and performance effects. Journal of Behavioral Medicine,<br />
5, 4 15-439.<br />
Spring, B. J., Lieberman, H. R., Swope, G., and Garfield, G. S. (1986). Effects of<br />
carbohydrates on mood and behavior. Nutrition Reviews/Supolement, 44,<br />
51-60.<br />
285<br />
. .
WHAT MAKES BIODATA BIODATA ?<br />
Fred A. Mae1<br />
US Army Research Institute<br />
Interest in the use of biodata in personnel selection<br />
continues to grow in all branches of the armed services. Various<br />
researchers have advanced legal, moral, and conceptual criteria<br />
that define biodata items and differentiate them from those that<br />
appear in temperament, attitude, or interest measures. These<br />
stated criteria are often disputed by other researchers or<br />
ignored in practice even by themselves. Moreover, in practice,<br />
many items termed tlbiodatal@ are indistinguishable from other<br />
self-report items. The result has been a continued blurring of<br />
what constitutes biodata.<br />
The confusion is especially problematic in light of the<br />
claimed advantages of biodata. For example, biodata scales have<br />
been shown to be more resistant to social desirability faking<br />
than temperament scales (Telenson et al., 1983). However, this<br />
may be true only of certain types of biodata, such as verifiable<br />
items. Similarly, reviews of selection measures (Reilly & Chao,<br />
1982) stating that biodata generally achieve higher validities<br />
than temperament measures are uninterpretable without knowing<br />
what, other than empirical keying, differentiates biodata from<br />
other measures.<br />
The purpose of this paper is to review criteria that have<br />
been used to define biodata and differentiate it from other selfreport<br />
measures. Drawing upon the work of previous researchers,<br />
the qualities that may uniquely define biodata across all<br />
applications are enumerated. Then, additional characteristics<br />
which may be desirable or legally required under certain<br />
circumstances are discussed. In the course of the discussion,<br />
differences between biodata and temperament scales are clarified,<br />
with the two viewed as potentially complementary, though not<br />
mutually exclusive, domains.<br />
The essence of biodata<br />
Biodata items attempt to measure previous and current life<br />
events which have shaped the behavioral patterns, dispositions,<br />
and values of the person. It is presumed that a person's outlook<br />
is affected by life experiences and that each experience has the<br />
potential to make subsequent life choices more or less desirable,<br />
palatable, or feasible. One possible reason is that the focal<br />
experience reinforces a pattern of behavior. Alternatively, the<br />
focal experience may be partly or wholely determined by earlier<br />
causal determinants- genetic, dispositional, or learned- which<br />
account for variations in both earlier and current behavior. A<br />
complete biodata measure should provide Ira reasonably<br />
comprehensive description of the relevant behavioral and<br />
experiential antecedents" (Mumford & Owens, 1987, p. 3).<br />
286<br />
.
Virtually all life experiences are potentially "job relevant",<br />
provided that they empirically differentiate better and poorer<br />
performers on a consistent basis.<br />
Biodata Item Attributes<br />
Historical versus Hvnothetical. Conceptually, biodata<br />
should pertain solely to historical events, activities which have<br />
taken place, or continue to take place. This attribute would<br />
exclude behavioral intentions or expected behavior in a<br />
hypothetical situation.<br />
External versus Internal. Some have argued that biodata<br />
items should deal with external, though not necessarily publicly<br />
seen, actions. These criteria would exclude items about<br />
thoughts, attitudes, opinions, and unexpressed reactions to<br />
events. An item about what one tvnicallv does in situations<br />
could satisfy the historical/external criterion.<br />
*Numerous biodata researchers have utilized non-external<br />
events in their biodata measures, and conceptually, non-external<br />
aspects of events are also capable of having significant impact<br />
on subsequent behavior. Nevertheless, the external event<br />
criterion may be crucial if claiming greater validity for biodata<br />
compared to temperament scales. Temperament scales require<br />
assessments of personal tendencies, often in areas in which<br />
people not only portray themselves favorably (t'impression<br />
management"), but actually see themselves in an unrealistically<br />
favorable light ("self-deception") (Paulhus, 1984). For example,<br />
most employees overrate their work performance compared to that<br />
of peers. Nondepressed persons consistently overrate their<br />
performance, so much that realistic self-evaluation may be<br />
indicative of depression (Mischel, 1979). Similarly, negative<br />
and positive affect orientations have been shown to be correlated<br />
with response patterns on temperament and related scales. Thus,<br />
the *'normallt tendency to overrate successes and underestimate<br />
failings can lead to self-deception and could possibly inflate<br />
responses to some temperament scales. By contrast, biodata<br />
scales dealing with external events purport to force the<br />
respondent to either answer honestly or consciously distort<br />
answers, with the assumption that fewer people will choose the<br />
latter.<br />
Obiective and First-hand versus Subiective. Some who<br />
prefer that biodata be descriptions of external events also feel<br />
that biodata should be obiective recollections, requiring only<br />
the faculty of recall. Subjective interpretation of events, such<br />
as ,assessing if one was 'Idisappointed"," angry", or "depressedl'<br />
in a given situation, would not fit this criterion. Evaluation<br />
of one's qualities or performance relative to that of others<br />
would also be considered subjective. A corollary would be that<br />
biodata items ask only for the first-hand knowledge of the<br />
respondent. Estimation of how others (peers, parents, teachers)<br />
would evaluate one's performance or temperament involves an<br />
287
-..<br />
additional level of speculative subjectivity. Subjective items<br />
would appear to increase the chance of self-deception. Although<br />
subjective corroboration from others is feasible, subjective<br />
items are never objectively verifiable, and hence the chance for<br />
social desirability faking is increased.<br />
Conversely, a number of biodata researchers have made<br />
frequent use of interpretive items. In some studies, subjective<br />
items have actually been shown to have higher predictive<br />
validities than objective ones. An advantage to subjective items<br />
that address self-perceptions is that they can better focus on<br />
unitary theoretical constructs. By contrast, performance of<br />
objective behaviors is often determined by multiple causes and<br />
dispositions, making it difficult to isolate the role of any one.<br />
Barge (1987) has provided evidence that homogeneous items,<br />
tapping a single disposition or tendency, are more predictive<br />
than heterogeneous items such as school or work performance.<br />
Construct-based items are also easier to use to develop<br />
rationally-based biodata scales. It would thus appear that the<br />
use of some subjective items may provide some countervailing<br />
advantages as well.<br />
Discrete versus Summarv Actions. Methodologically, it may<br />
be preferable to focus on discrete actions, dealing with a<br />
single, unique behavior (e.g. age when received driver's<br />
license), as opposed to summary responses (e.g. average time<br />
spent studying). Responses to discrete items only require memory<br />
retrieval, while summary items also require computation or<br />
estimation, thus increasing the chance of inaccuracy. However,<br />
the above preference for discrete actions would obtain only when<br />
the event is unique or singularly memorable. With a regularly<br />
performed behavior, summary recall could be more realistic and<br />
accurate than recall of a single, arbitrarily chosen instance.<br />
Verifiable. A verifiable item is an item that can be<br />
corroborated from an independent source. Item verifiability thus<br />
goes beyond both the external event and objective criteria. The<br />
optimal source of verification is archival data, such as school<br />
transcripts or,work records. Alternatively, the testimony of<br />
knowledgeable persons, such as a teacher, employer, or coach, is<br />
also considered verification by most researchers. Asher (1972)<br />
and Stricker (1987) have advocated exclusive use of verifiable<br />
items, though others utilize non-verifiable items, and some<br />
advocate interleaving verifiable and non-verifiable items<br />
(Mumford et al., 1990).<br />
One reason to use verifiable items is to reduce social<br />
desirability faking and outright falsification. However,<br />
Shaffer, Saunders, and Owens (1986) have shown that social<br />
desirability distortion is not a serious concern with biodata.<br />
Previous research on false or inaccurate responding to verifiable<br />
biodata items has shown mixed results (Cascio, 1975; Goldstein,<br />
1971) which may be due partly to methodological factors (Mumford<br />
& Owens, 1987). Merely warning respondents that answers will be<br />
288<br />
�
verified can reduce faking (Schrader & Osburn, 1977).<br />
Verifiability should be less necessary with discrete and publicly<br />
witnessed items for which "faking good" would require conscious<br />
lying. When developing biodata, obscuring the l'rightll answers<br />
and deleting transparent items should also discourage socially<br />
desirable responses, even without the threat of verification.<br />
Paradoxically, items which fit the narrowest definitions of "job<br />
relevant" and show the greatest point-to-point correspondence<br />
with future job performance would be most transparent and elicit<br />
the greatest need for verification.<br />
The issue of control. From the aforementioned perspective,<br />
that all life events have the potential to shape and affect later<br />
behavior, there is no reason to differentiate between experiences .<br />
that a person has consciously chosen to undertake and those that<br />
were components of the person's environment. In the same way<br />
that a decision to join.ROTC or study chemistry may lead a person<br />
in a behavioral direction, personal characteristics or the<br />
climate in a person's home and community could also affect<br />
subsequent behavior. Moreover, even optional decisions and<br />
behaviors, such as smoking or amount of time spent studying, are<br />
partially shaped by noncontrollable influences. This view is<br />
reflected in the instruments of biodata researchers who freely<br />
utilize both llcontrollable't and Inoncontrollablett biodata items<br />
(Glennon, Albright, &I Owens 1966). Stricker (1987), on the other<br />
hand, argues that it is unethical to evaluate people based on<br />
noncontrollable items pertaining to parental behavior, geographic<br />
background, or socioeconomic status. He also considers items<br />
dealing with skills and experiences not equally accessible to all<br />
applicants, such as tractor-driving ability or playing varsity<br />
football, to be unfair. Similarly, the developers of the Armed<br />
Services Applicant Profile (ASAP), a biodata measure of<br />
adaptability to the military, also attempted to delete all noncontrollable<br />
items from their instrument (Trent, Quenette, &<br />
Pass, 1989).<br />
In practice, however, consistent adherence to the control<br />
criterion would exclude all items pertaining to physical<br />
characteristics and educational level; behaviors, values, or<br />
interpersonal styles influenced by parental genetics or<br />
nurturing; and vocational interests and behavioral preferences<br />
partially shaped by one's environment. Strict adherence would<br />
thus lead to exclusion of most life experiences likely related to<br />
later behavior. It would also exclude many items typically found<br />
on school and job application blanks. This would present a<br />
severe constraint when sampling applicant pools without extended<br />
job histories, such as military applicants. It is not surprising<br />
that even some advocates of this criterion have been forced to<br />
violate it in their scales.<br />
Invasion of privacv<br />
A final concern involves invasion of privacv. Intrusive<br />
questions are mainly problematic with background checks that<br />
289
focus on previous criminal and aberrant behavior. In contrast,<br />
most biodata deal with behaviors whose revelation would not harm<br />
respondents. Some questions, such as those pertaining to marital<br />
status, age, and physical handicaps, may be invasive if the<br />
responses were to be placed in the employee's personnel folder,<br />
but not if the responses were used only by researchers to<br />
generate applicant scores. An additional reason not to reveal<br />
individual responses and their implications to decision-makers is<br />
in order to maintain biodata key confidentiality.<br />
Summary<br />
This paper proposes that the core attribute of a biodata<br />
item is that it addresses an historical event or experience. ,The<br />
rationale is that previous events shape the behavioral patterns,<br />
attitudes, and values of the person, and combine with individual<br />
temperaments to define the person's identity. Other attributes,<br />
though not defining biodata, may have methodological advantages.<br />
These include limiting items to those regarding external events,<br />
those that only require objective recollection of events, and<br />
those asking only for first-person recollections. Items<br />
involving discrete, unique events, and events that are verifiable<br />
are also favored by some for these reasons. However, these<br />
latter attributes may have their own limitations. Limiting<br />
biodata to controllable life events is seen as overly<br />
restrictive. Exclusive use of verifiable and especially<br />
controllable items may.hamper efforts to cover the domain of<br />
relevant life events, as well as reduce validity. While clearly<br />
intrusive items are offensive and hence undesirable, definitions<br />
of and concerns about invasion of privacy will vary, depending on<br />
the situation.<br />
By attempting to measure historical events and experiences<br />
that may have impacted on behavioral tendencies, it should be<br />
possible to focus on a unique realm of individual differences not<br />
exhausted by temperament and other self-report measures. Perhaps<br />
biodata measures, as presently defined, could be used in tandem<br />
with temperament measures for optimal results. However,<br />
researchers should be exceedingly careful about making claims<br />
extolling biodata's virtues over other self-report measures.<br />
REFERENCES<br />
Asher, J. J. (1972). The biographical item: Can it be improved?<br />
Personnel Psvcholoqy, 25, 251-269.<br />
Barge, B. N. (1987). Characteristics of biodata items and their<br />
relationship to validity. Paper presented at the 95th annual<br />
meeting of the American Psychological <strong>Association</strong>, NY, NY.<br />
Cascio, W. F. (1975). Accuracy of verifiable biographical<br />
information blank responses. Journal of Anplied Psvcholoqy,<br />
60, 767-769.<br />
290<br />
_
Glennon, J. R., Albright, L. E., & Owens, W. A. (1966). A cataloq<br />
of life history items. Greensboro, NC: Creativity Research<br />
Institute of the Richardson Foundation.<br />
Goldstein, I. L. (1971). The application blank: How honest are<br />
the responses? Journal of Applied Psvcholoav, 55, 491-492.<br />
Mischel, W. (1979). On the interface o.f cognition and<br />
personality: Beyond the person-situation debate. American<br />
Psvcholoqist. 34, 740-754.<br />
Mumford, M. D., & Owens, W. A. (1987). Methodology review:<br />
Principles, procedures, and findings in the application of<br />
background data measures. Anplied Psvcholoaical Measurement,<br />
11, l-31.<br />
Mumford, M. D., Owens, W. A., Stokes, G. S., Sparks, C. P., and<br />
Hough, L. (1990). Developmental determinants of individual<br />
action: Theory and practice in the application of background<br />
data measures. Unpublished manuscript.<br />
Paulhus, D. L. (1984). Two-component models of socially desirable<br />
responding. Journal of Personality and Social Psvcholoqv,<br />
46, 598-609.<br />
Reilly, R. R., & Chao, G. T. (1982). Validity and fairness of<br />
EOCe alternative employee selection procedures. Personnel<br />
Psvcholoav, 35, l-62.<br />
Schrader, A., & Osburn, H. G. (1977). Biodata faking: Effects of<br />
induced subtlety and position specificity. Personnel<br />
Psvcholoqv, 30, 395-405.<br />
Shaffer, G. S., Saunders, V., & Owens, W. A. (1986). Additional<br />
evidence for the accuracy of biodata: Long-term retest and<br />
observer ratings. Personnel Psvcholosv, 2, 791-809.<br />
Stricker, L. J. (1987). Developing a biographical measure to<br />
assess leadership potential. Presented at the Annual Meeting<br />
of the <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong>, Ottawa, Ontario.<br />
Telenson, P. A., Alexander, R. A., & Barrett, G. V. (1983).<br />
Scoring the biographical information blank: A comparison of<br />
three weighting techniques. Applied Psvcholoaical<br />
Measurement, 2, 73-80.<br />
Trent, T., Quenette, M. A., & Pass, J. J. (1989). An oldfashioned<br />
biographical inventory. Paper presented at the<br />
97th Annual Convention of the American Psychological<br />
<strong>Association</strong>, New Orleans, LA.<br />
291
JOB SAMPLE TEST FOR WY FIRE CONTROLMAN<br />
Susan Van Heme 1 , PhD<br />
Frank Al 1 ey, Ph D<br />
Syllogistics, Inc.<br />
Spr i r,gf i e 1 d, l44 22151<br />
Herbert George Ba.ker, PhD<br />
Laura E. Swirsk i<br />
Na.vr Personne I Research and Deve 1 opmen t Center<br />
San Diego, c.A Y2152-68Clcl<br />
ABSTRACT<br />
The Navy has developed job sample tests Sor a number of i ts<br />
en1 isted occupations !or ratings> as part O f t h e<br />
Joi n t-Serv i ce Job Performance Measurement Program. One of<br />
those rat i rigs is Fire Controiman (FCI. This paper details<br />
the deve 1 opmen t of hand,=-on tests for first-term FC: data and<br />
radar personnel r and their administration to a sample of FCs<br />
(N=103> . The resul ts of test i ng are di scuss.ed, showing the<br />
relationship of test scores. to several criteria.<br />
INTRODUCT I DN<br />
Several reports detai 1 the research stra.tegy .a n d<br />
purposes. of the Joint-Service Job Performance Ueasuremen t<br />
(JPMI/Enl istment Standards Project COff ice of the Assista.nt<br />
Secretary of Defense, 19821, and the origin and scope of the<br />
Navy JPF1 Program (Laabs & berry, 1987). I n the Nav 7 ’ s<br />
effort, performance meas.ures are be i nq deve 1 aped for sever.3 1<br />
ratings, one of which is tha.t of fire controlman C FC :) .<br />
The MIX 84 Gun F i re Control System C c;FCS:! i s u c.e d t 0<br />
control var i 01~s surf ace sh i p-moun ted guns, 1.4 h i c h are used<br />
against both surface and airborne targets. F i rs.t-term Ml< 86<br />
GFC!s Frl:.g. are current 1 y t r a i n e d a n d d e p 1 II ;v. e cl i n t IA o d i f f e r e n t<br />
c,pec i .sl t i es.. Bc,th r,.JEcs o>per.&te the PIP< 516 GFC:S? blJt NEC:<br />
1 1 2iT;a. Sp e i: i .+. 1 j 2 * '3 i n m .a i n t e n 3. n c e O f t tl e r .s. a j .3. r c. IJ b s. ::i’ E. t e m<br />
I:.J h i 1 @ /..JES: 1129 5. p e 11 i .a 1 i z e 5. i n ~~1.3, I n t + n EC n c e 0 f t h c lj.2. t .3.<br />
processing subsystem. Both t:,*pez. ,:a+ PlcC: 3,s Ftzs go tt-lrout~h a<br />
i t r a i n i n g pipe1 ine w h i c h i nc 1 udes E: 3.5. I: El ec tr i c i ty a n d<br />
El ec tron i csI a Fire Control cr;-school F<br />
for e i ther data or r ada,r . fvlh’ $4,<br />
pm.“1 ;,‘1”<br />
, .- c, =.<br />
:3 6<br />
are<br />
c .- s,: h 0 114 1<br />
u s u a 1 1 )’<br />
cross-tra.ined on the s.econd subsystem at the end of their<br />
f irs.t or tfeginn ing of their second tour of duty,<br />
APPROACH<br />
292
to develop materials to the best level of deta.il pos.sible,<br />
then returned to another SP~E pa.nel ior cr i t i 17~~. Use Elf SMEs<br />
from al 1 three PlK 86 FC C-schools ensured that al 1 of the<br />
5 i t e s wou 1 d have i n p u t i n t o the test development process.<br />
. Tryout was conducted on actual equipment to be used i n<br />
testing. Final SkjE review arId i tern ret’ i nement fol 1 C@Jed the<br />
tryout , and preceded the field test.<br />
Verification Of Tasks Selected For Testinq<br />
The first-term MK St; GFCS FC is ful 1:~ trained WI onl><br />
one of the subsystems. , and al though he I: the rating is closed<br />
to women> may work on the other subsystem, he is I7 0 t<br />
qua1 i f i ed on i t through training or experience. Because of<br />
th i s.$ a single set of tasks could not be used; separate test<br />
i terns had to be deuel aped for each subspec i al tr.<br />
The first step in the development process was to verify<br />
the task 1 i st, Panels of PlK 86 GFCS SPlEs convened at the<br />
three MK 85 GFCS C-School reviewed the 1 ist, and suggested<br />
subst i tutes for tas.ks found unsu i tab1 e. Each task w a 5<br />
evaluated as to i ts appropriateness for hands-on tes.t i ng,<br />
according t 0 the f 01 1 CIW i ng cr i t er i a : (1 > represen tat i vr’ness<br />
of t h e f irst-termer’s j ob (2) mission cr i t i cal i tr; 1; .‘3 ><br />
frequence o f performance C4:, suff i c i en t v a r i a b i 1 i t y i n<br />
performa.nce) and, (51 practical i t:2’ for t e s t i n g a t t t-l e<br />
C-school si tes. In addi t ion, the SklEs. were asked to consider<br />
the need for comprehens i ue task coverage, and equ i val en t<br />
test diff icul ties for the two NECs.<br />
The SPl&=G provided detai led informat ion on the 7 test<br />
i terns far eaih test. Specific .:.utltas.k:. were conf i rmed for<br />
each major task, al ong I.‘.J i t h specific f 5. u 1 t s for t l-l e<br />
di agnost i c and trout,1 eshoot i rig i terns., Addi t i onal t 9 c h rl i c 3. 1<br />
documen tat i on was prov i dqd for use i I-I deve 1 op i ng scar i ng<br />
shee t s for procedural i zed t a s k c. * The test tasks L’.J e r 8<br />
sequenced to provide the smoothest and quickest pos.sible<br />
progressi art through the tes.t , and equ i pmen t requ i remen ts<br />
were verified and refined. The i nformat i on gathered enabl ed<br />
the test development team to begin writing draft test items,<br />
and to prepare for the first SHE panel.<br />
i kf ter t h e f i n a 1 SI-1 E r e v i e w t h e nf orma t i on a n d<br />
revisions were incorporated into dra>t test i terns. P 1 a 1-1 5.<br />
were made for trying out the test i tems on a smal 1 s.ampl * caf<br />
first-term PlK S6 GFCS FCs.<br />
Test Item Tryout<br />
Four f i rE.t-trrm HK ;3& C;FC:S FC:S, c.tat ioned o n sh i DC. a t<br />
Norf 01 k, VA, served as tes.t subjects. TI~JO were da. t a FCs I:PAEC<br />
1125’) and two radar I/..JEC 1 12sj . The equ i pment used for the<br />
tryout was the Darn N e c k I-1K SA GFC!S trai n i ng s:;~*stem, IaJh i ch<br />
includes a full se t 0 f ac tu.31 equ i pmen t e q u i v 3.1 e n t t 0 t h e<br />
293<br />
.
i<br />
MOD 10 Capabi 1 i ty Expanded shipboard system. Except for an<br />
added simulation capabi 1 i tr for t h e interface t 0 t h e<br />
equipment control led br the system, and a “fan-out” version<br />
of the WE-7(V) computer (an extra UYK-7 wi th the c ircui t<br />
card pl a.nes exposed to permit easy access), t h e 5)‘s tern<br />
rep1 i cates a sh i p-mounted system? and i s compoc.ed e n t i r e I :k’<br />
of actual equipment, housed in two connect i n3 rooms in t t-l e<br />
SC l-l QO 1 building, with a full set Of s.ys, t err1 t e c h n c a 1<br />
documentation avai iable in the training laboratory, along<br />
wi th al 1 required tools and test equipment.<br />
The purposes of the tryout were to: cl> verify tha.t the.<br />
test items would perform properly with the equipment; ?12:)<br />
ensure that instruct i ons were clear and accurate ; ( 3 :I<br />
determine whether t h e suggested i tem t irne 1 i m i t 5 w e r e<br />
real i st i c ; (41 uerify that there would be some variability<br />
among subjects i n performance on the i terns; a n d ( .f ) r 4? 4.) e a 1<br />
unanticipated problems of any sort, Because of t l-l e srna 1 1<br />
samp 1 e , there WitS rl ct attempt to 3a ther 5 t a t i 5 t i c a 1<br />
informat ion.<br />
Because one of the purposes of thE. tr;*‘CifJt bJ.3S to<br />
determine whether the time limits were reasonable, a subject<br />
Was al 1 owed to cant i nue working on a task un t i 1 i t w a c.<br />
completed if he was making progress on the task, and the<br />
compl et ion t ime was recorded. Al 1 subjects were able to<br />
complete the test within four hours or less, but there wan<br />
considerable vari abi 1 i ty in the ccmplet i on time on most of<br />
the i terns. Th i s small sample did not permi t conf i dent<br />
predi c t ion of the best time 1 imi ts for al 1 i t ems 9 but did<br />
suggest some changes to suggested time 1 imi ts.<br />
The PeSlJl ts of the tryout were posi t i ve. No major<br />
problems arose during the tryout. The test i terns performed<br />
we1 1 0 n t f-1 e equ i pmen t , w i t h on 1 > m i nor adjustments i n<br />
procedure required (some improvements in the techniques of<br />
fault i riser t i on, to ensure that prefaul ted modules and<br />
groundi n3 straps were not visible to exami nees) . The<br />
instruct i ons were understandable, with a few areas to be<br />
clarified. The final i terns included these changes.<br />
Final SME Review<br />
The revised test i terns were reviewed tl:v SklEs. i two data<br />
instructors and two radar instructors) at Great Lakes. The<br />
St”!Es c 1 ar i f i ed scme technical issues, (e.g., documentation<br />
nomencl a.ture and faul t i riser t i on techn i ques> .<br />
Field Testinq<br />
Site PreDaration<br />
294
considerable time preparing for the tryout I rehearsing each<br />
i tern, verifying that the faul ts to be inserted would produce<br />
the des. i red i ndi cat i enc., and e n 5 u r i n g that ttre training<br />
equ i pmen t w a 5 in good condi t ion + c, ,’ t e 5 t i n g . Test<br />
admi n i strattor train i ng consi sted of rev i euJ and prac t i ce of<br />
the test items and procedures.<br />
Test i no Ders.onne 1<br />
The test admirl i E.tratctr ~a=. a retired E-S’ MK 36 GFC:S<br />
techn i c i an, who had served as a t”lt< 35 Course Di ret tar for<br />
the three years preceding his retirement. One of the school<br />
sen i or staff C E-7/3) W d S available throughout . He<br />
part i c i pated i n the preparation for testing, helped with<br />
equ i pmen t setup and was a.blc t o solve t h e f e w equ i FaKlerlt<br />
probl ems wh i ch occurred. Two observers were on si te, with<br />
one present in the testing area during all test periods.<br />
Test Subjects<br />
The samp 1 e consi sted of 103 individuals engaged i n<br />
the i r f i r s t term of mil i tarr serv i ce . There I~J e r e 45<br />
. .<br />
indlvlduals tested in Dam Neck and 53 individuals tested in<br />
San Diego. Al 1 individual s in this sample were ma1 e. The<br />
major i tr of the FC'S i n t h i s 5.3ff1p 1 Q were i n t tre third,<br />
fourth, and fifth years o f the i r mi 1 i tary serv i ce<br />
obl i gat i ens. S i x t y-one individual s were c 1 ass i f i ed i n t h e<br />
radar subrating and 42 individuals were classif ied in the<br />
data subrat i ng. Al 1 ~lCiC~%~ of the FC’s were high E.chool<br />
graduates who either earned diplomas or GED equivalents..<br />
EQuiDment<br />
The equipment used for the f i el d test was the same as<br />
that used for the tryout, plus equivalent equipment a.t the<br />
San Diego C-s.chool .<br />
Procedure<br />
When the subjects arrived at the testing si te, they<br />
were given a brief introduction to the project, with an<br />
explanation that their performance would in no way affect<br />
their service records, and would not be reported to anyone<br />
but project staff.<br />
At the beginning of each testing session, the s.ub.iect<br />
was given oral and wr i tten instruct ions on the test i ng<br />
procedure and the ground rules for t l-l e test i ng. Some<br />
biographical data were collected, and then the firs% item<br />
was admi n i stered. For e a c h i tern, the t e s t administrator<br />
gave oral and wr i t ten instruct ions on the task requ i remen ts<br />
and the time al 1 owed for camp 1 e t i on . The su bj ec t w a s<br />
encouraged to ask quest ions before beginning the task. W h e n<br />
295
the subj eC t indicated that he was ready to begin, the tes.t<br />
administrator instructed him to start and began timing.<br />
At s.evera 1 p a i n t s in the testing sequence i t was<br />
necessary for the test administrator to insert or remove<br />
faul t condi t i ons or otherwi se prepare the equipment for the<br />
next i tern. kt these times (3 or 4 per test> the subject was<br />
excus.ed and given a break of approximately fiue minutes..<br />
Throughout testing, the test adminstrator observed the<br />
subject’s actions, checking off 5 t e p s performed in<br />
procedures on the scoring sheets provided, and recording and.<br />
eval uat i ng troubl eshoot i ng (non-procedural 1 act i ens on other<br />
forms. When necessary, the test administrator queried the<br />
subject to determine what he was doing or attempting, Time<br />
to complete each task was al so recorded. Upon completion of<br />
testing, each s.ubject was. asked how frequent1 y he performed<br />
each of the tested tasks on the job, and when he had most<br />
recent 1 Y performed each tas.k.<br />
RESULTS<br />
Each of the FC hands-on performance tests consisted of<br />
7 tasks, each of which yielded a single, overal 1 score<br />
ranging from 0 to 100. Carrel at ions were computed for the<br />
radar and data subrat ings, combined and separatel:x*, and are<br />
shown i n Tab1 e 1. The carrel at i on between overal 1<br />
performance on the hand,=-on test with kFQT for the radar and<br />
data subrat i rigs comb i ned was -.03. Correct i ng for<br />
restriction in range resulted in a correlation of .12. The<br />
corre 1 at i on between overal 1 $erf ormance on the hand,s-on test<br />
wi th AFGT for the data subrat ing was .30. T h e carrel at i on<br />
between overal 1 performance on the hands-on test with AFQT<br />
for the radar subrat i ng was -.lO. Correcting for<br />
restriction in ranoe res.ul ted in a correlation of .17 for<br />
the data subrating- and .14 for the radar subrat ing.<br />
correlations were not signif icant,<br />
All<br />
CoPbInd (Data and Radar)<br />
Uands-On and AFOT -.03<br />
�������� � d AFOT<br />
corrected for restrIction 8n range .i2<br />
Data<br />
Hands-On and A.=01 *JO<br />
“m&-On and AFOT<br />
cmrr*cted for restriction 1.9 ran** .I7<br />
Radar<br />
Hands-On md AFOT -.,o<br />
Hands-On and AFOT<br />
corr,ct.li f0P rrstrlctlcn In r*ngr .I4<br />
Tablo 1. Correlations Rmtwem Hands-On<br />
Porformmce w, th AFDT
REFERENCES<br />
Laabz., G. ~1.) &‘< Berry, V. 14. ( 1 P87, August). The Navr<br />
job performance measurement uroaram: Backuround,<br />
i nceP t i on, and current status (NF’RDC TR S7-34) .<br />
San Diego: Navy Personnel Res.earch and Development<br />
Cep ter .<br />
.<br />
Off ice of the Assistant Secretary of Defense (PlRA&L). 1:1982,<br />
December). ‘Joint -service efforts to 1 ink en1 istment .<br />
standards and job performance: First annual report<br />
to the House Commi ttee, on Appropriation. Washington,<br />
DC: Author . ‘<br />
297
ASVIP: AN INTEREST INVENTORY USING<br />
COMBINED ARMED SEWICES JOBS<br />
Herbert George Baker, PhD<br />
Mar j or i e M . Sands.<br />
Navy Personne 1 Research and Deve 1 opmen t Cen ter<br />
San [> i ego, CA y2152-68CllJ<br />
Arnold R. Spokane, PhD<br />
Spokane Career Assoc i ates<br />
Al lentown, PA 18104<br />
ACISTRACT<br />
A number of 7vocational i n terest i nven tor i es have<br />
been devel aped by the A r m e d Serv i ces for use in<br />
guiding en1 i sted p b? r 5 I:I t-1 r-1 e 1 into VI i 1 i t .3. r Ir;<br />
0 c c u p a t i 0 n 5 . T I-I e 4. tl’ i ri 5 t. r u rrl e n t t h 3 ‘2 e I, ‘“. * d<br />
occupa t i onal activities, jub t i ties, retreat i orial<br />
activities, and so ftr th -- or , a combi nat i cln cef<br />
such elements. Test subjects then i n d i c .a t e the i r<br />
interests or preferences for each i tern. Al though<br />
great efforts have been made to cross-code<br />
military .j 0 b s between t h e Serv i ces a i-1 d w i t h<br />
civi 1 ian occupations, unti 1 I-I 3w n 0 interest<br />
measure has used the combined Armed Serv ices jabs.<br />
Th i s paper describes t h e development a n il<br />
admi n i strat ion of the Armed Serv i ces Vocat i onal<br />
Interest Prof i le (ASVIP) . The instrument uses the<br />
job titles (officer and en1 isted) found in the<br />
Mi 1 i tary Career Guide, the jobs al so having been<br />
assigned three-letter Ho1 1 and Codes. In scar i ng,<br />
resul ts indicate the most preferred Ho1 1 and Cude)<br />
plus a n i ndi cat i on of h i gh or 1 CUJ preferred<br />
occupational level . This paper reports on a studr<br />
t 0 measure t h E, e n d o r s e m e 1’1 t 0 f t h e<br />
combined-services jobs. Suggest i ens 3re made for<br />
use and for further research.<br />
INTRODUCTION<br />
Vocational interests have long been recognized as one of the<br />
many individual character i st i cs that affect occupat i ona.<br />
exploration, j ob acqu i si t i an, WCII-k 5.3 t i 5 f ac t i on c. , a n d .r<br />
perhaps, performance. There are many theories of uocat i 13r1al<br />
i n terests and j ob preferences, and a great number of<br />
i nstrumen ts have been deve 1 oped to identify and 115 e a 5 I.J r e<br />
vocational i n terests. One of t h e ma j or US-55 4 ,I. p t h 8 c. 8<br />
instruments is in guiding young people into the tz’pes of<br />
work for which their interests best suit them.<br />
298<br />
*<br />
.
Similarly, a number of vocational i n terest inventor i es. have<br />
been devel aped by the Armed Serv i ces for use i n gu i di ng<br />
en1 isted personnel into military occupations. Examples<br />
include the Uocat iunal Interest Career Exa.minat ion (Al ley,<br />
1978>, developed by the Air Force, and the Navy, L’ocs.~~,(~;,~~<br />
Interest I riven tory (Abrahams I L a u , & Neumann r -ao.J .<br />
Al though research has shown promise to enhance the se1 ect i on<br />
and classification processes through the i ncorporat ion of a<br />
formal , measured interest component, with the exception of<br />
the Air Force, i riterests have remained an exper imen tal as<br />
opposed to an operational consideration.<br />
The various vocat i onal i nterest instruments devel uprd by the<br />
Armed Services have used occupational activities, j ob<br />
titles, recrea t i onal activities, and 5.0 forth -- or, a<br />
combi nat i on of such elements. Test subjects are asked to<br />
indicate their i n terests or preferences for each i tern.<br />
Scoring systems then report out an interest type, match the<br />
subject wi th a f-1 occupa t i onal area, or i n 5. cm e 0 t h e r wa)<br />
indicate the interests of the individual.<br />
A few years ago, great efforts were made to cross-code<br />
military jobs between the Services and PJ i t h civil ian<br />
occupat i ons, i n a project sponsored by the Office of The<br />
Assistant Secretary of Defense (FM&P> (Dale, Wright, Haven,<br />
Pavlak, & Lancaster, 1989). The resul t was a taxonomy of<br />
what may be called combined-Services jobs -- identical to no<br />
specific job, but i ncorporat i ng occupat i onal i nf orma t i on<br />
from each Service that has a similar job (plus the Coast<br />
Guard). I t should be noted here that 5 am e .j ohs (e.g.?<br />
infantrymen, dentists, etc.) are n o t represented in t h e<br />
occupat i onal structure of al 1 the Services.<br />
The combi ned-Serv i ces job taxonomy offers a nurnber of<br />
research oppor tun i t i es using occupat i onal i nf orma t i on<br />
specific to DOD jobs. Howeuer, to date, no interest mellsure<br />
h a s used the combined Armed Services jobs.<br />
APPROACH<br />
The combi ned-Seru i ces j chs , both en 1 i sted (w1:34 j a n d<br />
officer (N=71> , 1 isted in the Ni 1 i tarr Career Gu i de<br />
(Department of Defense , 1981) were merged and al phabe t i zed<br />
into a numbered 1 ist of 205 i terns. Whi le there has been much<br />
controuersr over the wisdom of using job t i t 1 es in interest<br />
measurement ) substantial research supper ts the i r use. More<br />
recently, Ho1 1 and, Got tf redson I a n d GaKer ( 1 ppTJ > , i n<br />
research wi th Navy recrui ts, found that use of job t i tles<br />
WCtB both feasible and meaningful. Arguably, the Navy Is job<br />
ti ties are the most esoteric and potential 1~ confusing to a<br />
young person , ret there were few, i f any, problems i n the i r<br />
use wi th young ma1 e sailors. Consequentlr, the even more<br />
understandable combi ned-Serv i ccc. job t i t 1 es were cons i dered<br />
fully suitable for use as items on an interest instrument.<br />
299
The 205 job t i tl es. thus became i tems on an i riven tory, the<br />
Armed Serv i ces Uocat i onal Interest Profile (ASVIP), as shown<br />
in .Figure 1. The answer sheet 1 ists, for each item, three<br />
answer options, L, I , and D (for Like, Indifferent, and<br />
Dislike, respectiuely). Typ i cal scar i ng strategies 5) s i n g<br />
these options cal 1 for simply disregarding the I responses,<br />
and suhtrac t i ng the number of Ds from the number of Ls..<br />
44. Construction Equipment Operators<br />
45. Correct i ens Spec i al i sts<br />
46. Court Reporters<br />
47. Data Entry Special i sts<br />
48. Data Processing Equip. Repa.irers<br />
4s. Data Processing Managers<br />
50. Dental Laboratory Techn i c i ans<br />
51 , Dental Spec i al i sts<br />
52. Dentists<br />
53 . Detectives<br />
54. Dietitians<br />
55. Dispatchers<br />
56. Divers<br />
Figure 1 . . Examples of 1 terns<br />
The ASVIP was administered to samp 1 es of ma1 e (N=l 501 and<br />
female (N=150) Navy recruits, at the Great Lakes, I1 1 inoic.<br />
and Or 1 ando, Fl or i da Navy Recru i t Train i ng Commands. T h e r e<br />
WEtS no time limit for the test. Completion times ranged from<br />
18 to 32 minutes, with a mean of 24 for males and 23 far<br />
females.<br />
Th i s pi lot study, i n addition to a s 5 e s s i n g t h e test i ng<br />
logistics for the instrument, was designed to reveal 5 I-1 e<br />
levels of endorsement for the 205 jotIs. Thus, for the effort<br />
repor ted here i n, only the L responses were considered in the<br />
scoring process..<br />
RESULTS<br />
Each of the 205 jobs received SCIITI~ endorsement from this<br />
s a m p l e of Navy recrui ts, even though not all of the jobs c a n<br />
be found in the Navy. The range of endorsement (out of a<br />
poss i tl e 300) is. f ram 20 4 for Photographers to 25 for<br />
Cl 0th i ng a.nd Fabr i c Repni rers. T .3. /J 1 e 1 s t, CII,~.J~ t h c c u mu 1 a t i v e<br />
frequent i es wi th which each of the uff i cer 3. rf fij 2 rl 1 i 5 t e ~j .j 13 tt.5.<br />
\,..I .$ 'S. C ri #j 811 r 5. e rJ , T t-1 a t i z. , t: h e t 3, b 1 e 9. j-, Ol,...J~f. r-, urn tse I- il f t i n-1 e E. t h e<br />
L I KE r e 5.~1 cl n ‘5. e cp t i ,c, n !..,,I 2. E. (1 t, 0 : + r; F I; r I: /-, 3 t i t em .<br />
CONCLUSIONS<br />
The resul ts 0 f t I-I i s pi lot s t u d :>’ c. j-1 Cl(>.J t h 3. t t r-1 e I- e<br />
i -2<br />
endorsement + 0:~ r al 1 cornbi neJ-Serv i t:es. .j 0 t1.s 9 a Ii d t t-l 3. t there<br />
i s a reaSOnatl1 e di spersi on across. al 1 of t h e j 0 tl t i t 1 es.<br />
This sctgqfictc __d _- that the ASVIP might be useful a.5 ,j , 3. 5 C ?J ‘Z. 5. j 0 l-l<br />
-.-_<br />
300
tool wi th which tu begin acquaint i rig young people wi th the<br />
occupational oppor tun i t i es offerred by the mil i ta.ry, that<br />
is, as a gu i de for occupat i anal exp 1 ora t i on i n t o t h e<br />
milli tary working community. Administration to other-Service<br />
and civilian samples is an obvious necessity before any firm<br />
conclusions could be drawn as to the feasibility for use of<br />
t h e AS’S I P i n j ob expl oration, counse 1 i ng, and<br />
classification.<br />
RECOMMENDATIONS FOR FURTHER RESEARCH<br />
A number of research opportun i t i es suggest themse 1 ves at<br />
this point. One would be to compare t t-l e 5 t r e n c~ t h o f<br />
endorsement across the 205 job titles with the endorsement<br />
of similar civilian job titles, the latter being information<br />
al ready avai 1 able i n the research 1 i terature.<br />
.<br />
At the time of data co1 1 ec t i on, gender i nformat ion was al so<br />
collected. This enables a study of differential response<br />
pat terns be twen ma1 es and female Na.vy recrui ts. Al c,o, data<br />
were co1 1 ected wi th an al ternat ive instrument using the sxr~e<br />
items, but 1 isting them within the cateyor ieE. used in the<br />
Mi 1 i tarr Career Gu i de, rather than in an alphabet i cal<br />
1 isting. This makes it possible to study the effects of<br />
presenting items in simple alphabetization versus presenting<br />
them in ways that make possible the influences of categorj<br />
names on response patterns.<br />
Fur thermore, each of the 20sb combi ned-Serv i ces job5 14 a -3<br />
coded using the Ho1 1 and three-letter occupat i onal cod i ng<br />
system (Ho1 1 and, lP85>, Several studi es suggest themselves:<br />
(1) compar i sons between ma1 e and female endorsements across<br />
t l-l e six Ho1 1 and primary codes; (21 assessmen t of i nd i v i dual<br />
response consistency within each Holland primar:v code; and<br />
(3) tracking t h e wsjec t5 and compar i ng perf ormarrce<br />
eval uat i ens in 1 ight of the congruence tte tween i n tere.sts and<br />
actual job .3ssignn1ents i n t h e Na v :Y .<br />
Other possi bi 1 i t i es inc’lude studying t h 8 d i f f e r e n t i a 1<br />
respon5.e pat terns for h i gh and 1 CIW asp i T-EC t i on 1 eve 15 0: i . e , Y<br />
c.f f i cer and en1 i sted jobs), in terms of both ma.1 e-femal c<br />
differences and intra-individual consistency.<br />
Finally, t l-l e much-discussed i mp ac t of forward 3. r e G,<br />
ass i qnmen t on women s job asp i rat i ens can be addr e c-.se d i n k<br />
small VJE%j/ by using the ASVIP. The i rfs trumen t shou 1 d be<br />
administered to another Navy female recruit population a fecz.1<br />
years hence to assess the impact of Operation 0 8 s. e r t !s /-I i e i j<br />
on wome rl ’ 5 job asp i rat i ens.<br />
302<br />
., .
REFERENCES<br />
Alley, W. E. (1973) (jocat i ona. Interest Career Exami fiat i on:<br />
use and Appl i cat ion i n Cc~urlse 1 i nc~ a.nd J Cl t, PI acemen t<br />
(AFHRL-TR-73-62). 3rooKs kir Force Ease, TX. Personne 1<br />
Research Di v i si on.<br />
ALrahams, N. kl., Lau, A. W. g & Neumann, I . (1963) &<br />
Ana. ysis of the Navy Vocat i ma.1 Interest I n v e n t or y as a<br />
Predictor of SC h 00 1 Perf ormirnce and Rat i rlq Ass i qnmen t .<br />
(NPRDC-SRR-69-11) , San D i E c~ 0 : l.dalJ Y Personnel Re se a.r c h<br />
Activity.<br />
Dale, C., W r i g h t , G., Haven, R . , F‘a~laK, PI., & LancaSter, 61.<br />
The DOD Mi 1 i tary/Ci v i 1 i an Master Cr OSSW.3 1 K Project.<br />
Proceedings of the 31st Annual Conference of the Mi 1 i tary<br />
Testina <strong>Association</strong>. San Antonio, TX: Air Farce Hum.~n<br />
Resources L&oratory a n d USAF Occupat i onal l4eacuremen t<br />
Center, pp. 250-255.<br />
Department of Defense (1988) r-1 i 1 i t 3. r s Career GIJ i tje<br />
1933-1939. Washington, DC: Author.<br />
Ho1 1 and, J. L. , Gottfredson, C;. D. , & E:ak:er, H. G. c i 99~~:~<br />
Ual i di ty of Vocat i onal Asp i rat i ons and Interest Inventor i es:<br />
Extended, Rep1 icated, and Re in terprs ted. Journa 1 Clf<br />
Counseling Psychology, 37, 3, pp. 337-342.<br />
Ho1 land, J. L, r: 1935) Manual far t h e t,Jc,cat j orfa.1 Prererence<br />
Inventory, Odessa, FL: Psychological {+~.sessKlerl t Re%olJrces.<br />
303
PREDICTING PERFORMANCE WITR BIODATA<br />
Morris S. Spier, Ph.D.<br />
Somchai Dhammanungune, Ph.D.<br />
U.S. <strong>International</strong> University<br />
Herbert George Baker, Ph.D.<br />
Laura E. Swirski<br />
Navy Personnel Research and Development Center<br />
ABSTRACT<br />
.<br />
A scored biographical questionnaire was developed and<br />
administered to a sample of Navy Fire Controlmen in two subratings:<br />
radar operations and data processing. The subjects<br />
were subsequently administered an extensive, hands-on test<br />
of technical proficiency. A correlational analysis<br />
identified 15 items that may predict proficiency for the<br />
radar subrating, and 20 items which may predict job<br />
performance for the data processing subrating. Crossvalidation<br />
is needed to confirm the findings.<br />
INTRODUCTION<br />
The notion that past behavior is the best predictor of<br />
future behavior both supports and receives support from the<br />
use of scored autobiographical questionnaires. Biodata has<br />
_ demonstrated its usefulness in predicting a range of factors<br />
employment setting including: (1) career<br />
~~ogr~!~ion; (2) turnover/job tenure;<br />
(3) job satisfaction;<br />
and (4) trainability. The convergence of the findings to<br />
date support the notion that biodata approaches tend to be<br />
excellent predictors of a wide range of employment-related<br />
criteria.<br />
The Armed Services, in cooperation with the Department of<br />
Defense (DOD), are currently engaged in a Joint-Services Job<br />
Performance Measurement (JPM) Project of which the present<br />
research is a subtask. The larger project is investigating<br />
the feasibility of measuring on-the-job performance with an<br />
aim toward using the measures to set military enlistment<br />
standards. As a part of its contribution to the Joint-<br />
Services Project, the Navy (Laabs & Berry, 1987) is<br />
developing performance measures for a number of occupational<br />
specialities (ratings), including that of Fire Controlman<br />
���� �<br />
There are, thus, separate proficiency tests for radar and<br />
data processing personnel. Scoring test is done using a<br />
scoring sheet to grade steps in the process as having been<br />
completed either "correctly" or llincorrectly,'t and to grade<br />
any products produced as a part of the process as either<br />
"acceptable" or nunacceptable." The final score is a tally<br />
of the correct and acceptable actions and products.<br />
304<br />
.
METHOD<br />
The purpose of the present research was<br />
autobiographical questionnaire and to<br />
relationship between scores on the biodata<br />
performance on the hands-on tests.<br />
Biodata Questionnaire Development<br />
to develop<br />
determine<br />
instrument<br />
t::<br />
and<br />
A l24-item draft version of the Personal Activities<br />
Inventory was developed, based on a review of the relevant<br />
literature, and on the nature of the critical tasks to be<br />
performed during the job performance test. Emphasis was<br />
placed on biodata factors associated with mechanical<br />
interests, abilities, and experience, numerical and<br />
technical/scientific interests and abilities, past<br />
experience with computers, and on work, academic, and<br />
personal experiences that might be reasonably expected, on<br />
an llarmchairtl basis, to be related to task performance.<br />
Attention was similarly given to the development of items<br />
that might reflect the cognitive (e.g., attention to detail)<br />
and social (e.g., working alone or with others) processes<br />
that might be reflected in task proficiency. The 124 items<br />
were classified into 24 broader Biodata Factors. The draft<br />
version of the Inventory was reviewed and, following minor<br />
refinements, was pretested on a small sample (N=15) to<br />
determine ease of administration. No problems were found.<br />
Subjects<br />
Subjects for the biodata testing were first-term FCs. The<br />
103 sailors who were scheduled to be administered the handson<br />
job performance measurement test were, thus, a sample of<br />
convenience for the present study. While predictor<br />
(biodata) scores were collected for all 103 subjects, both<br />
predictor and criterion (job performance) data were<br />
available for only 56 of the total sample tested, 25<br />
(44.61%) radar and 31 (55.4%) data processing.<br />
Administration of the "Personal Activities Inventory"<br />
The final version of the Inventory was administered at Dam<br />
Neck and San Diego. Subjects were logged-in, given the test<br />
booklet and answer sheet, and instructed to begin. There was<br />
no time limit.<br />
Analysis of the Data<br />
Hands-on test data were entered into the computer. The raw<br />
scores for each of the seven critical tasks were summed for<br />
each subject in the form of a standard score. ,Data were<br />
analyzed separately for the two subratings. The response<br />
format of each item was the determining factor in the<br />
305
analysis. For biodata items in which the response options<br />
represented a continuum, the biodata scores were related to<br />
the job proficiency.scores using the Pearson Product Moment<br />
Correlation. Items with dichotomous or discontinuous<br />
response options were analyzed using the Point Bi-Serial<br />
Correlation.<br />
Fe-Radar Operations Personnel<br />
RESULTS<br />
Table 1 shows that nine (9) Biodata Factors contained items -.<br />
which correlated at a statistically significant level with<br />
the job performance data of the radar personnel. It is<br />
interesting to note that four of the items fall within the<br />
Adjustment/Emotional Maturity Factor; an additional four<br />
items deal with some aspect of Technical/Scientific,<br />
Mechanical, or Numerical Factors. Overall, 13 separate<br />
items validated against the criterion data. Table 2<br />
presents the results of the Pearson Product Moment<br />
Correlations (continuous to continuous variables) for radar.<br />
The validity coefficients range from.322 (p c.05) to .575 (p<br />
I<br />
_-- - I
Table 6 presents the results of the Point Bi-Serial<br />
Correlations (dichotomous to continuous variables) for the<br />
data processing subrating. Note, again, that Item #61 and<br />
Item #75 each have two response foils that reach statistical<br />
significance. As a result, the 12 Biodata Factors that<br />
validated for data processing, contain 20 statistically<br />
significant validity coefficients.<br />
DISCUSSION<br />
The data from the present study, while based on relatively<br />
small samples and still needing cross-validation, suggest -.<br />
optimism. Among radar persons, 15 validity coefficients<br />
reached levels of statistical significance across 9 Biodata<br />
Factors. Among data processing people, 20 validity<br />
coefficients reached statistically significant levels.<br />
Moreover, the size of the coefficients are consistent with<br />
those reported in the literature for job proficiency in<br />
relation to biodata predictors (Mumford C Owens, 1987). In<br />
fact, the correlations are larger than those reported for<br />
other uses of biodata to predict military proficiency where<br />
only ratings, rankings, and archival data were used as the<br />
criterion (Barge & Hough, 1986).<br />
CONCLUBIONS<br />
A correlational analysis identified 15 items that may<br />
predict. The data suggest that a biodata test may be a<br />
useful surrogate for job proficiency tests. However, the<br />
limitations of the study, for example, the restricted sample<br />
size, make it essential that these findings be crossvalidated<br />
to confirm and establish the predictive factors.<br />
It is further recommended that the emergent l@profilel@ of the<br />
ratings be used to generate hypotheses about factors that<br />
may be predictive and thus lead to a higher proportion of<br />
discriminating items. Lastly, thought should be given to<br />
extending the biodata approach to other Navy ratings.<br />
REFERENCES<br />
Barge, B.N., and Hough, L.M. (June, 1986). Utility of<br />
biographical data for predicting job performance. In<br />
Leatta M. Hough (Ed.), Literature review: Utility of<br />
temperament, biodata, and interest assessment for<br />
predicting job performance. Alexandria, VA: U.S. Army<br />
Research Institute for the Behavioral and Social Sciences.<br />
Mumford, M.D., and Owens, W.A. (March, 1987). Methodology<br />
review: Principles, procedures, and findings in the<br />
application of background data measures. Applied<br />
Psychological Measurement.<br />
309
INTRODUCTION<br />
DEVELOPMENT OF EQUATIONS FOR PREDICTING<br />
TESTING IMPORTANCE OF TASKS<br />
Walter G. Albert<br />
William J. Phalen<br />
Air Force Human Resources Laboratory<br />
The Specialty Knowledge Test (SKT) is an important component<br />
of the Weighted Airman Promotion System (WAPS). SKTs are lOOitem<br />
multiple choice achievement tests designed to measure job<br />
knowledge in various Air Force Specialties (AFSs). They are<br />
written annually for each AFS by teams of four to eight subject<br />
matter experts (SMEs). The SMEs are senior NCOs in the AFS for<br />
which a particular test is being written. A psychologist<br />
experienced in test construction procedures is assigned to each<br />
team to serve as a group facilitator.<br />
A critical part of the test construction process for any SKT<br />
is the preparation of the test outline, which guides the SMEs in<br />
determining how many questions they should write for each<br />
knowledge or duty area of the AFS. The outline used in test<br />
construction is generated in one of two ways. For many years,<br />
the SMEs created their own outline, which is referred to as the<br />
Conventional Test Outline (CTO). Recently, an automated process<br />
has been used to develop outlines for some AFSs. With this<br />
process, the Automated Test Outline (ATO) is available for use<br />
when the test development team arrives. The AT0 is generated<br />
from information gathered from testing importance (TI) surveys,<br />
where senior NCOs are asked to rate the importance of each task<br />
as to whether the knowledge(s) required to perform it should be<br />
covered by the SKT.<br />
An important advantage of the AT0 procedure over the CT0<br />
procedure is the direct link established between important tasks<br />
performed by incumbents in the AFS and test questions which<br />
address the knowledges required to perform those tasks. The AT0<br />
process has been implemented in several AFSs, but currently it is<br />
regarded as an experimental procedure and is being evaluated<br />
against the CTO. This paper investigates whether information<br />
routinely collected from occupational surveys can be used to<br />
generate accurate TI values for each task. The resulting<br />
prediction equations could then be used to select tasks for<br />
inclusion in testing importance surveys of previously unsurveyed<br />
AFSs or to serve as a surrogate for TI, when a TI survey cannot<br />
be accomplished.<br />
OCCUPATIONAL SURVEYS<br />
An occupational inventory containing up to 2,000 task<br />
statements is administered to a large number of incumbents in<br />
each AFS. These tasks are grouped into seven to twenty duty<br />
areas. Each duty area is comprised of a group of tasks that form<br />
a major activity associated with the job specialty. Each<br />
310<br />
_ .
surveyed job incumbent is requested to estimate the relative<br />
amount of time that he/she spends in performing each task on a<br />
nine-point scale that ranges from "very small amount of time" to<br />
"very large amount of time." No response means that the<br />
incumbent does not perform the task. Each of these ratings is<br />
divided by the sum of the relative time spent values for all of<br />
the tasks in the inventory to get a percentage of time spent<br />
value for the incumbent on each task. From these responses, the<br />
following values are computed for each task: (a) the percentage<br />
of incumbents performing the task (PMP), (b) the percentage of<br />
time spent by incumbents performing the task (PTM), and (c) the<br />
average pay grade of incumbents performing the task (AG). -.<br />
Another survey containing the same task list as the<br />
occupational inventory is administered to a large sample of<br />
senior NCOs in each job specialty, who use a nine-point scale to<br />
estimate the difficulty in learning to perform each task<br />
successfully (TD) and the emphasis that should be given in formal<br />
training on each task for newly-hired employees (TE). Raters are<br />
asked to respond to all tasks they are familiar with, even if<br />
some of them are not part of their current job. The TD and TE<br />
values for each task are the means of the responses.<br />
CONVENTIONAL TEST OUTLINE DEVELOPMENT<br />
The CT0 is organized according to broad job knowledge areas.<br />
The test development teams spend one to two days to create CTOs<br />
by specifying and weighting knowledge categories based on their<br />
own expertise and their review of appropriate personnel<br />
classification and training documents, such as the Specialty<br />
Training Standard, which describes important duties and tasks for<br />
each job specialty; the Position Classification, which describes<br />
all duties and responsibilities for each job specialty; and the<br />
SKT abstract, which furnishes the following information for each<br />
task in the AFS: PMP, PTM, TE, AG, and TD. The SMEs decide on<br />
the number of test questions to be written on each knowledge<br />
area, based on their determination of the relative testing<br />
importance of that area.<br />
AUTOMATED TEST OUTLINE DEVELOPMENT<br />
The first step in the AT0 process is to select those tasks<br />
from the inventory that are performed by at least 50% of the<br />
incumbents or have TE values at least one standard deviation<br />
above the mean. The screening process selects approximately 150<br />
to 250 tasks for each AFS. A survey containing the selected<br />
tasks is administered to approximately 70 senior NCOs to obtain<br />
their opinions on the importance of including a question on the<br />
SKT concerning the knowledge required to successfully perform<br />
each task. The rating scale for testing importance is a sevenpoint<br />
scale that ranges from *Ino importance" to t'extremely high<br />
importance.t' The interrater reliability of these ratings is<br />
estimated and deviant raters are eliminated (Lindguist, 1953).<br />
The testing importance (TI) value for each task is the mean of<br />
the ratings after deviant raters have been eliminated.<br />
311
An AT0 is organized by duties and tasks within duties. All<br />
tasks on the TI survey are listed under the appropriate duty.<br />
The TI values are used to weight the duties and tasks. To<br />
accomplish this weighting, the TI values for each task are<br />
squared and summed within a duty. The weight for each duty is<br />
the sum of the squared TI values across all tasks within the duty<br />
divided by the sum of the squared TI values across all duties.<br />
These weights are the percentages of test questions to be<br />
selected to cover the required knowledges to successfully perform<br />
the tasks within each duty.<br />
The TI value of each task within a duty is reflected by a<br />
letter from A to D. Tasks are designated as r'AV' tasks if their -.<br />
TI values are at least one standard deviation above the mean of<br />
the TI values or if their TI values are at least 6.0. Similarly,<br />
tasks are designated as 'ID" tasks if their TI values are more<br />
than one standard deviation below the mean of the TI values:<br />
however, all tasks with TI values of at least 4.00 are designated<br />
as qVC'* tasks. Of the remaining tasks, the upper 50% are<br />
designated "B" tasks and the lower 50% are designated "C" tasks.<br />
SMEs are required to write at least one item to test the job<br />
knowledge required for every '@A" task and to write no more than<br />
three items for a single task. Procedures are available to<br />
override these restrictions; however, they require written<br />
justification. Items can be written on "D1@ tasks only with the<br />
group facilitator's approval.<br />
PROCEDURE<br />
Tasks for each of 26 AFSs for which testing importance<br />
indices were available (914X0, 753X0, 423X3, 791X0, 423X4, 792X1,<br />
915X0, 908X0, 392X0, 231X2, 542X2, 674X0, 552X0, 324X0, 542X1,<br />
427X3, 321XlE, 112X0, 121X0, 274X0, 321XlG, 241X0, 431X0, 275X0,<br />
566X0, and 231X0) were randomly divided into two samples--one<br />
sample designated the Validation sample" and the other sample<br />
designated the "cross-validation sample." First, the IIAtt tasks<br />
were randomly split between the two samples, such that each<br />
sample contained approximately an equal number of l*AVV tasks. The<br />
IIB, II IICII and I'D“ tasks were split between the two samples in the<br />
same manner. Regression equations were computed separately for<br />
each validation and cross-validation sample with TI as the<br />
criterion and PMP, PTM, AG, TD, and TE as the predictor<br />
variables. The two sets of regression weights computed for each<br />
AFS were applied to the predictor scores for the cross-validation<br />
sample to generate predicted testing importance (PTI) values.<br />
The predictive efficiency of each set of weights can be measured<br />
by the Pearson coefficient of correlation (r) between TI and PTI.<br />
If the shrinkage in r using the two sets of weights on the<br />
cross-validation sample is statistically nonsignificant (Walker t<br />
Lev, 1953), then the data for both samples can be combined for a<br />
hierarchical clustering analysis. In this procedure, the number<br />
Of regression equations is reduced by one at each stage of the<br />
clustering by combining AFSS into groups and combining their<br />
corresponding regression data. The two most similar groups are<br />
combined at each stage, as measured by the resulting loss of<br />
312<br />
,<br />
i
overall predictive efficiency, (i.e., the reduction in r between<br />
TI and PTI). The process continues until all data are combined<br />
into a single equation. Analysis of the r losses at each stage<br />
allows identification of the fewest number of regression<br />
equations that can accurately generate FTI values across all<br />
AFSs. In order to measure how well each set of weights would<br />
reproduce an ATO, the weights were used to classify tasks into<br />
the "A-D" categories. PTI values were classified into importance<br />
categories of A through D using a procedure identical to the one<br />
for TI values. Classification accuracy (CA) was measured by<br />
computing the table and formula shown in Figure 1.<br />
Predicted Ch0~ltlcatlon<br />
A B C D<br />
A F 11 F 11 F 1s F 14 Rl<br />
B<br />
Actual F 21 F 22 F 2s F 24 R2<br />
Cluoiflcatlon<br />
c F91 F SP F 93<br />
F R3<br />
94<br />
D F 41 F 44 F,, F 44 Fk<br />
Cl c2 cs c4 N<br />
F;; ir the frequency in the iJ%ll<br />
Figure 1. Classification Table and Formula<br />
CA has been weighted such that misclassifications result in<br />
larger penalties as the V'distance18 between predicted<br />
classification and correct classification becomes greater. This<br />
weighting strategy is reasonable, in that testing importance<br />
differences associated with categories in the table become<br />
greater as the 8*distance1V between the categories increases. The<br />
range of CA values is 0% (every classification has maximum<br />
distance from the correct classification) to 100% (every<br />
classification is correct).<br />
RESULTS<br />
The r's using weights from the cross-validation samples<br />
ranged from .44 (908X0) to .92 (914X0) and the r's using weights<br />
from the validation samples ranged from .42 (566X0) to .91<br />
(121X0). Therefore, there is great amount of variability among<br />
the AFSs in the ability of a linear function of the five<br />
predictors to account for the variance in TI. Because the<br />
shrinkage in r using the weights from the validation and crossvalidation<br />
samples was nonsignificant (a(=.O5) across all AFSs,<br />
the validation and cross-validation samples were combined for<br />
subsequent analyses.<br />
313<br />
_
classification tables and CA's were also computed for each<br />
set of weights. The CA's using weights from the cross-validation<br />
samples ranged from 70% (908X0) to 92% (121X0); and for the<br />
validation samples, from 68% (231X0) to 92% (121X0). Of the<br />
4,104 tasks classified within the 26 AFSs, only four "D" tasks<br />
were classified as "A" tasks and only four @'A" tasks were<br />
classified as @rD@t tasks. Although it is desirable to have zero A<br />
to D or D to A misclassifications (because the test development<br />
team is being advised incorrectly to write or not write an item),<br />
infrequent misclassifications of this type should not adversely<br />
affect the construction of a valid SKT. The team can rectify<br />
these discrepancies with the permission of the group facilitator. *:<br />
CA's computed for the combined data ranged from 71% (908X0) to<br />
90% (112X0). In general, the predictive accuracies using<br />
combined samples were higher than those for the validation<br />
samples referred to earlier, but all differences were small (less<br />
than 6%). Only two @ID" tasks were classified as rcA't tasks and<br />
two *'Aft tasks were classified as "D" tasks. Squared and<br />
interactive predictor terms were added to the model for each AFS<br />
in an attempt to increase classification accuracy, but only small<br />
increases in accuracy were observed. In fact, for some AFSs,<br />
classification accuracy decreased.<br />
What is adequate classification accuracy in the context of<br />
generating an ATO? The table having the lowest CA value (68%) is<br />
shown in Figure 2. It was generated by applying the 112X0<br />
weights from the validation sample to the cross-validation<br />
sample. The impact of the misclassifications in the table is<br />
probably not too severe when it is recalled that the AT0 is a<br />
guide for SMEs to use in developing an SXT, and they are free to<br />
select tasks from any of the importance categories within the<br />
restrictions delineated above.<br />
Pfedl,otd CtuJfloation<br />
Flgwo 2. Claultlcatlon Tsbk wlth Lorn.1 a Valu.<br />
The r's for the combined data for each AFS ranged from .51<br />
(908X0) to .91 (112X0). A hierarchical clustering of the<br />
regression equations for ail 26 AFSs showed small decreases in r<br />
throughout most of the clustering process. For example, the<br />
overall r dropped from .84 at the 26-group stage (i.e., a<br />
separate regression equation for each of the 26 AFSs) to .79 at<br />
the 5-group stage. Thereafter, the drops in r to the l-group<br />
Stage were .02, .02, .04, and .12, respectively.<br />
The gradual drop in r's until the clustering at the l-group<br />
stage makes identification of an "optimal clustering stage"<br />
difficult. Therefore, classification was also examined at<br />
various stages. The CA's of equations at the l-group stage<br />
314
anged from 62% (908X0) to 89% (112X0); however, the second<br />
smallest CA was 71% (321XlE). In comparison, the CA's at the 26group<br />
stage ranged from 71% (908X0) to 90% (112X0), with the<br />
second smallest CA being 72% (231X0). Therefore, the range of<br />
the CA's doesn't change much between the two extremes of the<br />
clustering process. At the 26-group stage, only 2 "A" tasks were<br />
classified as "Dw tasks and only 2 "D" tasks were classified as<br />
ltA1l tasks. With the exception of one AFS (908X0), where six 'ID"<br />
tasks were classified as IlA" tasks and one "A" task was classfied<br />
as a "D" task, there were only three "A" tasks classified as "D"<br />
tasks and two "DM tasks classified as "A" tasks over all AFSs at<br />
the l-group stage.<br />
A Wilcoxon matched-pairs signed-ranks test (Siegel, 1956)<br />
was used to compare the differences in CA's between the the 26group<br />
stage and the 5-group stage and between the 5-group stage<br />
and the l-group stage. There was a statistically significant<br />
difference (d=. 05) between the 26-group and 5-group stages, but<br />
not between the 5-group and l-group stages. Although<br />
significantly better classifications result from the use of 26<br />
eqUatiOnS, 20 Of 26 AFSs had differences Of 5% or 1eSS (maX=13%).<br />
If generalized equations are to be used to classify tasks in<br />
other AFSs where TI data are not available, it appears promising<br />
that a single prediction equation could generate adequate testing<br />
importance values. Further analyses are being conducted to<br />
identify the highest and lowest stages that are significantly<br />
different from the 26-group and l-group stages, respectively.<br />
CONCLUSIONS<br />
A large amount of the variance in TI was accounted for by<br />
linear combinations of the task-level predictors. The stability<br />
of least squares weights within each of the 26 AFSs was<br />
demonstrated. Prediction equations adequately classified tasks<br />
according to testing importance with very few A to D or D to A<br />
misclassifications. Use of squared and interactive predictor<br />
terms added little to predictive efficiency. A hierarchical<br />
clustering of the regression equations developed for each AFS<br />
showed small decreases in predictive efficiency throughout most<br />
of the clustering process. Preliminary results indicate that a<br />
single prediction equation may do an adequate job of classifying<br />
tasks on testing importance across all AFSs.<br />
REFERENCES<br />
Lindguist, E. F. (1953). Design and analysis of exneriments in<br />
psvcholoav and education. Boston: Houghton Mifflin Company.<br />
Walker, H. M. & LeV, J. (1953). Statistical inference. New York:<br />
Henry Halt and Company.<br />
Siegel, S. (1956). Nonnarametric statistics for the behavioral<br />
sciences. New York: McGraw-Hill.<br />
315<br />
_ .
Authors:<br />
INTRODUCTION<br />
ESTIMATING TESTING IMPORTANCE OF TASKS<br />
BY DIRECT TASK FACTOR WEIGHTING<br />
William J. Phalen, Air Force Human Resources Laboratory<br />
Walter G. Albert, Air Force Human Resources Laboratory<br />
Darryl K. Hand, Metrica, Inc.<br />
Martin J. Dittmar, Metrica, Inc.<br />
This paper is one of a series of presentations delivered at the current and previous two <strong>Military</strong> <strong>Testing</strong><br />
<strong>Association</strong> Conferences to document R&D of an automated, task-data-based outline development procedure<br />
for Air Force Specialty Knowledge Tests (SKTs). A companion paper to thii one (Albert & Phalen, 1990)<br />
provides a brief description of the automated test outline (ATO) procedure. This paper will focus on that part<br />
of the AT0 procedure having to do with the selection process by which 150 to 250 tasks are selected from a job<br />
inventory containing up to 2,000 tasks for inclusion in a <strong>Testing</strong> Importance. Survey booklet. Up to now, rulebased<br />
screening procedures have been used to identify potentially important tasks to include in the survey, with<br />
cutoffs on percent of members performing each task at the E-5 and E-6/7 paygrade levels and on the<br />
recommended training emphasis index being the primary selection criteria. A little over a year ago, research<br />
was initiated to derive and validate a minimal subset of regression equations for predicting the SME-furnished<br />
testing importance ratings in 28 AFSs with linear combinations of five task-level predictor variables, i.e., percent<br />
of members performing (PMP), percent time spent by members performing (PTM), ave.rage paygrade of<br />
members performing (AG), task learning difliculty (TD), and field-recommended task training emphasis for fusttermers<br />
(TE). So far, it appears that possibly one, but not more than three, generalized regression equations<br />
may adequately classify tasks into their appropriate testing importance categories. These equations will,<br />
hopefully, perform several important functions. First of all, they should provide a more accurate and defensible<br />
task selection procedure for surveying AFSs that have not been previously surveyed. Secondly, the predicted<br />
testing importance (PTI) values generated by the equations should be able to serve as surrogate testing<br />
importance indices when time or budget constraints prevent the administration of testing importance surveys.<br />
Thirdly, when a new job inventory is developed and administered in an AFS whose testing importance data are<br />
based on the old job inventory tasks, the new data for the predictor variables should be available to use in<br />
conjunction with one of the generalized regression equations to generate PTI values for all the tasks in the new<br />
job inventory.<br />
But the application of these PTI equations also raises several pertinent questions: (1) How can we<br />
determine which PTI equation should be used to generate PTI values for a previously unsurveyed AFS? (2) Can<br />
SMEs provide direct estimates of Al?%specific weights for the five predictor variables that are nearly as accurate<br />
for an AFS as the generalized regression weights? (3) Is it possible that the need for regression-generated or<br />
SME-derived weighting is obviated by simple unit weighting of the five predictor variables? The potential value<br />
of direct estimation of predictor weights by SMEs was anticipated back in 1987; accordingly, an SKT Task Factor<br />
<strong>Testing</strong> Importance Survey booklet was developed and administered to the SMEs in all AFSs for which SKTs<br />
were developed in 1988, 1989, and 1990 (to date). The booklet used in 1988 contained seven factors, the two<br />
additional ones being “consequences of inadequate performance” (CIP) and “requirement for prompt<br />
performance” (RPP), the latter being a rewording of the old “task delay tolerance” factor in order to reverse the<br />
direction of the scale and make it consistent for all factors. In 1989, it was decided to limit the task factors<br />
surveyed to the five which were routinely surveyed by the USAF Occupational Measurement Squadron<br />
(USAFOMS); thus, CIP and RPP were dropped. The elimination of the CIP and RPP factors also made it<br />
possible to assess the effect of their presence or absence on the other five factors. In 1990, the CIP and RPP<br />
factors were restored to the survey in order to introduce more variance into the profiles of the SME-furnished<br />
factor weights and thus eliminate some fuzziness from the clustering solution. The availability of data on the<br />
same seven factors for the same AFSs in 1988 and 1990 made it possible to assess the stability of factor weights<br />
over a two-year period, assuming, of course, that the SMEs in both periods were equally representative of their<br />
Al=%.<br />
316<br />
- .<br />
I<br />
I
THE SURVEY INSTRUMENT<br />
The SKT Task Factor <strong>Testing</strong> Importance Survey is administered to all SMEs who have been sent by<br />
their respective commands to participate in the development of SKTs in their AFSs. TO date, approximately<br />
1,000 SMEs have been surveyed. The survey is group-administered by a member of the USAFOMS test<br />
development staff immediately following the SKT in-briefing. It takes about 10 minutes to read the instructions,<br />
fill in the background section, and provide ratings on the seven listed factors (1 to 7 scale). In order to clearly<br />
communicate what the XT task factor rating process is all about, the rating instructions, scale, and factor<br />
definitions as they appear in the survey booklet are shown in Figure 1.<br />
RESULTS<br />
A. Reliabilitv Analvsis. There were 35 AFSs in which the SKI Task Factor <strong>Testing</strong> Importance Survey<br />
was administered in 1988 and again in 1990. In most instances, no SMEs appeared in both survey<br />
samples. As shown in Table 1, the average number of raters per AFS in 1988 was 3.50, and in 1990,<br />
the average was 3.59. The average correlation between the mean factor profiles (across seven<br />
factors) for the 35 AFSs was 4841 (correlations averaged through 2,). A value this high was<br />
considered very acceptable, especially since it involved a two-year time interval between<br />
administrations and small numbers of different raters per AFS at both points in time. This value<br />
compares very well with the average test-retest reliability of 5835 that was obtained on task-level<br />
testing importance ratings for 26 raters in 20 AFSs with a 3-to-4-month interval (Weissmuller,<br />
Dittmar, & Phalen, 1988). These raters were surveyed by mail and were later surveyed again when<br />
they were selected to serve on an SKT development team. The difference between the two<br />
reliability coefficients was found to be nonsignificant (p = .4337). As a further test, the 1988-to-1990<br />
factor profile correlation ( i = 4841) was treated as a group measure of interrater reliability (RJ<br />
with no time interval involved, and the R, was reduced to a single-rater reliability value (R,,) for<br />
comparison with the mean R,, value for task-level testing importance ratings across all 28 AFSs that<br />
had been surveyed. The computed R,, value for a composite reliability (Ra of 4841 based on an<br />
overall average of 3.54 raters per factor profile was 2649. The average R,, for the task-level testing<br />
importance ratings across the 28 surveyed Al% was Z40, an almost identical value. Yet, the<br />
former involved a two-year interval and the latter is a concurrent measure of internal consistency,<br />
B. 1. Two tests were<br />
applied to determine whether the relative weights of the common five factors were affected by<br />
adding or removing the additional two factors (i.e., CIP and RPP). In the first test, each factor was<br />
given an overall rank in terms of its mean rating in 1989 (live-factor survey) and its mean rating in<br />
1988 and 1990 separately (seven-factor surveys). The Mann-Whitney test was applied to assess the<br />
differences in the sums of ranks. The mean ratings of the PTS, AG, and TD factors were relatively<br />
unaffected by the presence or absence of the additional factors, but PMP and TE showed signilicant<br />
shifts in their mean ratings (p c .Ol). Both were significantly higher when CIP and RPP were<br />
absent (or sign&a.ntly lower when CIP and RPP were present). A test was also applied to<br />
determine whether the sizes of the differences between the PMP and TE means in the five-factor<br />
vs. the seven-factor environment ‘were related to the sizes of the mean CIP and RPP values.<br />
Regression equations of the form i%@ - PMP, = W, m, + W, RPP, were applied. None of the<br />
regression results were found to be significant. Thus, while it can be said that PMP and TE were<br />
affected in a given direction by the presence or absence of CIP and RPP, there was no indication<br />
that the level of difference was proportional to the level of CIp and m.<br />
317<br />
.
SECTION II. INST’RUCI-IONS<br />
Imagine that you have been asked to review the job-task statements in the most went USAF Job Invcntoy administered<br />
in the career field for which you are developing SKT’s. This survey could contain anywhtre from SO0 to 1200 or more task<br />
statements. Next, assume that you have been asked to rate each task statement indicating how important it is to include the job<br />
knowledges needed to petiorm that task on a Specialty Knowledge Test. A task would be rated high in testing importance if it<br />
requires knowledges that are critical to successful job performance within the career field.<br />
You are in luck, however. You are not being asked to prwidc these 500 or more ratings. Instead, seven factors (or.typcs<br />
of information) have been proposed as possible factors in determining tht testing importance level of a task. These seven factors,<br />
along with thtir descriptions, are shown in Section II, SKT TASK FACIGR TESTING IMPORTANCE RATING SCALE YOU<br />
are asked to rate each task factor on how impottant it is to consider this factor when assigning a testing importance rating to the<br />
tasks performed by airmen in the Air Force Specialty for which you are developing SK%. Using the scale provided, determine the<br />
most appropriate rating and record your rating in the column provided.<br />
Ratine Factor<br />
SECTION II: SKI- TASK FACIOR TESTING IMPORTANCE RATINGS<br />
RATING SCALE FOR FACTORS IN TESTING IMPORTANCE<br />
This factor has:<br />
7 = Extremely High Importance<br />
6 = High Importance<br />
5 = Above Average Importance<br />
4 = Average Importance<br />
3 = Below Average Importance<br />
2 = Iow Importance<br />
1 = No Importance<br />
- 1. Percent Members Performing: a measure of the proportion of all airmen who perform the task<br />
- 2. Average Percent Time Soent: a measure of the proportion of the total work time that airmen in the AFS spend<br />
performing the task<br />
- 3. Average Grade: the average grade of all airmen who perform the task.<br />
- 4. Learning Difficult\l: a measure of the relative length of time required to learn to perform the task properly.<br />
- 5. Ccmseaucnces of Inadeauate Performance: a measure of the probable seriousness of failing to perform the task<br />
properly. Tht impact is measured in terms of possible injury or death, damage to equipment, wasted supplies or lost<br />
work-hours, etc.<br />
- 6. Reauiremtnt for Promot Performance: a measure of the length of time from the moment that an airman is aware<br />
that a task will need to be done up to the point at which the task MUST be performed. In other words, doer the<br />
airman have to be able to perform the task immediately, or does he or she have time to consult a manual or seek<br />
guidance?<br />
__ 7. Field-Recommended I&t-y-Level Training Fmoh&s: a measure of how strongly NCOs in the field have<br />
recommended the task for inclusion in formal, structured training programs for entry-level airmen. Structured<br />
training may include resident technical school, on-the-job training (OJT), field training detachments (FTDs), or<br />
career development courses (CDCs).<br />
Figure 1. SKT Rating Form<br />
318
C. Clusterine of Factor Profiles vs. Clustering of PTI Remession Equations. One objective of<br />
gathering task factor ratings from SMEs was to provide a means of determining which one of<br />
several generalized regression equations should be applied to previously unsurveyed AFSs to<br />
select the appropriate set of tasks for inclusion in a Task <strong>Testing</strong> Importance Survey. If AFS<br />
factor profiles produced a clustering of AFSS that corresponded to the clustering of AFSs on<br />
similarity of regression equations, then regression equation group membership could be defined<br />
for task factor clusters of AFSs for which there were no regression equations. Various attempts<br />
were made to produce corresponding clustering solutions, but no adequate match could be<br />
generated. A major impediment was the fact that even in the case in which the input sample<br />
of factor profiles contained the maximum amount of variance (1988,1989, and 1990 combined)<br />
the “between” overlap for the last two groups to merge was 86.3% and the total sample “within”<br />
overlap was 93.2%. On the other hand, the clustering of regression equations did not seem to<br />
indicate a need for more than one equation. Thus, a lack of variance was present in these data,<br />
as well. If additional research indicates that only one overall regression equation is needed for<br />
all AFSs, then the need for a procedure to select the appropriate regression equaiion for a<br />
previously unsurveyed AFS vanishes.<br />
D. Comnarison of Remession- vs. Factor-Weighted Eauations for Predictinp Testine: Imnortance<br />
of Tasks. Table 1 shows the predictive efficiency of the AFS-specific PTI regression equations<br />
for 25 AFSS for which task-level testing importance id&s were available and for which SMEs<br />
had provided factor weights in 1988, 1989, or 1990. Since the derivation and validation of the<br />
regression equations and their predictive efficiency are discussed in detail in a companion paper<br />
(Albert & Phalen, 1990), the correlations of predicted and actual testing importance values for<br />
the 25 AFSs are reported here only for their comparison with the correlations produced by the<br />
SME-based factor-weighting approach (which standardizes each task factor before applying the<br />
factor weights and sums the cross-products into a testing importance composite). In Table 1,<br />
only the highest correlations computed for the 1988, 1989, and 1990 factor weights and all<br />
possible combinations thereof are reported in order to show the highest correlations this<br />
approach can hope to produce for comparison against the best alternative, i.e,. the least-squares<br />
fit of task-level indices for the five task factors (predictors) to the indices of task-level testing<br />
importance (criterion).<br />
For some unexplainable reason, the 1990 factor weights uniformly produced<br />
lower correlations than the 1988 weights. Overah, the factor-derived correlations averaged to<br />
a respectable i = 602 at the E-5 level and i = 606 at the E-6/7 level, compared to i = .798<br />
and .786 for the E-5 and E-6/7 regression-derived correlations, respectively. The difference is<br />
significant (p c 01) in both cases, but the real difference is in the lack of uniformity of fit of<br />
the factor-derived approach; i.e., in some cases, it matches the regression-derived correlations<br />
quite well, and in other cases rather poorly. It appears that the SME-furnished factor-weighting<br />
approach is not an acceptable alternative to the regression approach, as long as the regression<br />
alternative remains supportable,<br />
E. Differential vs. Unit Weiphting of Factors. Because there was little variance in the SMEderived<br />
factor weights, and substantial positive correlations existed between the five task factors<br />
and the testing importance criterion, with the exception of average grade (Weissmuller, Dittmar,<br />
& Phalen, 1989), there was a distinct possibility that a unit-weighted linear composite of the<br />
standardized task factors might do almost as well as the differentially weighted composite. The<br />
effect of unit weighting on the correlations with testing importance are shown in Table 1 under<br />
the heading “Unit.” The unit weighting approach produced correlations for both the E-5 and<br />
E-6/7 levels that were generally close to the correlations derived from differential weighting by<br />
SMEs, with only two instances showing a substantial drop in correlation (both within the same<br />
AFS); but 14 correlations based on unit weighting were actually higher than those based on<br />
differential weighting. Tests of significance of difference between the i ‘s for differential and<br />
319
=TlwilhlTl<br />
r-TlrithPll
DJSCUSSION<br />
unit weighting at the E-5 and E-6/7 levels (602 vs. 565, and 606 vs. S82, respectively) yielded<br />
no significant differences. These findings clearly indicate that there is virtually nothing to be<br />
gained by continuing to gather factor importance ratings from SMEs, since. unit weighting of<br />
the factors is equally effective.<br />
The findings of this study suggest one positive conclusion and three negative conclusions. The positive<br />
conclusion is: (1) Factor importance weights display good reliability, even when the interval between<br />
administrations is as long as two years. The negative conclusions are: (1) The factor importance weighting<br />
approach does not yield correlations with task-level testing importance that would permit abandonment of the<br />
more rigorous regression approach, which requires the administration of task-level testing importance surveys<br />
in order to obtain criterion data for generating a least-squares solution. (2) There does not appear to be<br />
sufficient variance in the profiles of factor weights to provide a clustering of AFSs that corresponds sufficiently<br />
well with the clustering of AI%-specific regression equations; therefore, the clustering of profdes of factor weights<br />
is not useful for indicating which generalized regression equation should be used for a particular AFS (assuming<br />
that more than one equation will be needed to adequately cover ah AFSs). (3) Since unit weighting of the<br />
testing importance factors in virtually as good as SME-furnished differential weights, there is little to be gained<br />
by continuing to gather factor importance ratings from SMEs.<br />
RECOMMENDATIONS<br />
Discontinue administration of the <strong>Testing</strong> Importance Factors Survey and concentrate instead on<br />
improving the predictive efficiency and classification accuracy of the regression-based procedure.<br />
REFERENCES<br />
Albert, W.G., & PhaIen, WJ. (1990). Development of equations for predicting testing importance of<br />
tasks. Proceedings of the 32nd Annual Conference of the Militarv TestinP <strong>Association</strong>,<br />
Orange Beach, AL.<br />
WeissmulIer, J J., Dittmar, M J. & Phalen, WJ. (1989). Automated test outline develonment: research<br />
findins (AFHRL-TP-88-70, AD-215 401). Brooks AFB, TX: Manpower and Personnel<br />
Division, Air Force Human Resources Laboratory.<br />
321<br />
.
Upper Body Strength and Performance in Army Enlisted MOS<br />
Elizabeth J. Brady and Michael G. Rumsey<br />
Army Research Institute<br />
Introduction<br />
Cognitive testing for selection and classification purposes<br />
has a long and distinguished history in the military services.<br />
The link between cognitive ability and soldier performance has by<br />
now been firmly established, providing a reasonably solid basis<br />
for this type of testing.<br />
The concept of screening on the basis of physical strength<br />
capability is less firmly established. A solid empirical<br />
foundation linking physical strength to overall job performance<br />
does not as yet exis:. Yet for those jobs requiring lifting or<br />
moving heavv n:,ysic?l objects, the question naturally arises as<br />
to whether SC"'- llllllirnal degree of physical strength might be an<br />
appropriate prerequisite.<br />
This question began to receive special attention in the<br />
1970's, as the number of women serving in the military, as well<br />
as the number of specialties open to women, increased<br />
dramatically. In 1976, the General Accounting Office recommended<br />
that the services develop common physical standards for males and<br />
females in specialties where physical strength attributes were<br />
relevant to effective performance. In 1982, A Women in the Army<br />
policy review evaluated the strength requirements of a variety of<br />
jobs. Then, 1984, the Army began administering the <strong>Military</strong><br />
Entrance Physical Strength Capacity Test ('MEPSCAT) to each<br />
applicant for enlistment at the <strong>Military</strong> Entrance Processing<br />
Stations (MEPS). Results of the test were used for job placement<br />
counseling rather than for determining an individual's<br />
qualification for entering any particular job.<br />
In 1987, the Army's personnel office, the ~. Office-of _. the ,,<br />
Deputy Chief of Staff for Personnel (ODCSPER), determinea tnat it<br />
was time to review its physical strength screening process. The<br />
question of most immediate concern was: are the benefits of<br />
screening worth the effort? The initial approach taken to<br />
answerina the o-uestion was to explore whether there was any<br />
evidencedthat physical strength iimitations were perceived to<br />
interfere in any substantial way with job performance in the<br />
Army.<br />
Presented at the meeting of the <strong>Military</strong> <strong>Testing</strong><br />
<strong>Association</strong>, November, 1990. All statements expressed in this<br />
Paper are those of the authors and do not necessarily reflect the<br />
official opinions or policies of the U.S. Army Research Institute<br />
or the Department of the Army.<br />
322 I
The ODCSPER directed that a Physical Requirements<br />
Questionnaire (PRQ) be developed and administered to determine<br />
the extent to which job incumbents were perceived, by themselves<br />
or their supervisors, as having difficulty in performing their<br />
job due to upper body strength limitations. Accordingly, the<br />
U.S. Army Research Institute, in collaboration with the Enlisted<br />
Accessions Division of the ODCSPER and the Exercise Physiology<br />
Division of the Army Research Institute of Environmental<br />
Medicine, developed a 7-item supervisor version and an ll-item<br />
incumbent version of this questionnaire. Only the results from<br />
the incumbent version will be discussed in this paper.<br />
This paper will assess the extent to which insufficient.<br />
upper body strength is perceived to interfere significantly with<br />
job performance in a representative sample of Army jobs. These<br />
self-report data will also be related to MEPSCAT scores, an<br />
objective measure of upper body strength.<br />
Method<br />
Subjects. The total sample size consisted of 11,069 (88%<br />
male, 12% female) job incumbents across 21 <strong>Military</strong> Occupational<br />
Specialties (MOS). There were 65% white, 27% black, 4% hispanic,<br />
and 4% other in this sample. The mean age for 86% of the males<br />
was 20, and 60% of the females had a mean age of 21. Due to<br />
missing data, the actual sample sizes used in the following<br />
a.lalyses may be somewhat smaller.<br />
Phvsical Requirements Ouestionnaire. The incumbent version<br />
of the PRQ contains 11 items, which consist of 10 multiple choice<br />
and one short answer. This version was pretested in April 1988,<br />
as part of a field test of Project A second tour measures. It<br />
was administered to 79 second tour soldiers (36 to 60 months in<br />
service) in three MOS (13B, cannon crewmember; 88M, motor<br />
transport operator; and 95B, military police). The results of<br />
the pretest indicated that the PRQ was easy to administer, that<br />
the response options were reasonable, and that it could be<br />
completed in less than 10 minutes.<br />
Phvsical Demand Cateqories. The purpose of the physical<br />
demand categories is to assign soldiers to jobs for which they<br />
are physically qualified. The categories are based on upper body<br />
strength. According to AR 611-201, the five categories are: (1)<br />
LIGHT - occasionally lift 20 pounds and frequently lift 10<br />
pounds: (2) MEDIUM - occasionally lift 50 pounds and frequently<br />
lift 25 pounds; (3) MODERATELY HEAVY - occasionally lift 80<br />
pounds and frequently lift 40 pounds; (4) HEAVY - occasionally<br />
lift a maximum of 100 pounds and ,frequently lift 50 pounds; and<br />
(5) VERY HEAVY - occasionally lift over 100 pounds and<br />
frequently lift 50 pounds. As shown in Table 1, the Project A<br />
sample has 14 Very Heavy MOS, 1 Heavy MOS, 4 Moderately Heavy<br />
MOS, 2 Medium MOS, and no Light MOS.<br />
323<br />
c<br />
. .
Table 1<br />
MOS by Physical Demand Cateaories<br />
VERY HEAVY 11B Infantryman<br />
12B Combat Engineer<br />
13B Cannon Crewmember<br />
19E M48-M60 Armor Crewman<br />
19K Ml Armor Crewman<br />
27E Tow/Dragon Repairer<br />
31c Single Channel Radio Operator<br />
51B Carpentry & Masonry Specialist<br />
54B Chemical Operations Specialist<br />
55B Ammunition Specialist<br />
63B Light Wheel Vehicle Mechanic<br />
67N Utility Helicopter Repairer<br />
88M Motor Transport Operator<br />
94B Food Service Specialist<br />
HEAVY 76Y Unit Supply Specialist<br />
MODERATELY HEAVY<br />
MEDIUM<br />
16s Manpads & PMS Crewmember<br />
29E Radio Repairer<br />
91A Medical Specialist<br />
95B <strong>Military</strong> Police<br />
71L Administrative Specialist<br />
96B Intelligence Analyst<br />
Data Collection. The objective was to collect questionnaire<br />
responses from a large number of first tour incumbents in a<br />
reasonably representative set of Army MOS, or jobs. It was<br />
determined that the most effective means of achieving this<br />
objective was to administer the PRQ as part of a large-scale data<br />
collection being conducted as one stage in a research effort,<br />
known as Project A, to improve the Army's enlisted selection and<br />
classification system. Between July, 1988 and February, 1989,<br />
the PRQ was administered to 11,069 soldiers in 21 MOS chosen to<br />
reasonably represent the full set of Army MOS for Project A<br />
purposes.<br />
Results<br />
A factor analysis with an orthogonal varimax rotation<br />
yielded two factors, which accounted for 46% of the common<br />
variance. The first factor includes items which deal with the<br />
individual's inability to get the job done: the second factor<br />
includes items which tend to focus more on ways in which to<br />
improve job performance.<br />
324<br />
.
Factor 1,<br />
For purposes of this paper, one representative item was<br />
selected from each scale for purposes of highlighting some of the<br />
principal results that emerged from our initial analyses of these<br />
data. From the first factor, the item selected reads as follows:<br />
How many times in the past six months have you had insufficient<br />
upper body strength to complete a task assignment in your MOS?<br />
The response options for question 1, and the proportion of<br />
respondents choosing each option, are shown below:<br />
Prooortion<br />
Ontion Male Female Total<br />
1. 10 or more 7 8 7<br />
2. 5 to 9 3 5 3<br />
3. 2 to 4 8 17 9<br />
4. 1 6 6 6<br />
5. None 76 64 75<br />
Before further analyses were conducted, response options<br />
were grouped into two categories based on the degree of<br />
difficulty experienced by the respondent in performing tasks:<br />
high difficulty (options 1 and 2) and low difficulty (responses<br />
3, 4 and 5). Thus, 10% of the total group and of the males, and<br />
13% of the females, fell in the high difficulty group.<br />
The next analysis examined whether this type of difficulty<br />
was related to ability to lift as measured by the MEPSCAT score<br />
obtained at the time of enlistment. Individuals were sorted irrto<br />
two groups based on their MEPSCAT score: one group consisting of<br />
those who were able to lift 110 pounds, and a second group<br />
consisting of those who were not. The difference between the<br />
groups was rather small: 9.7% of those with high MEPSCAT scores<br />
reported high difficulty; 11.5% of those with low MEPSCAT scores<br />
reported such difficulty.<br />
Next, results were compared across MOS. Substantial<br />
differences were found, with motor transport operators having the<br />
largest percentage (16) in the high difficulty group and radio<br />
repairers having the lowest percentage (3).<br />
The next set of analyses examined characteristics which<br />
might at least in part account for MOS differences. It was fo.und<br />
that 11% of the soldiers in MOS with very heavy physical strength<br />
requirements, compared with 7% in the other MOS, fell in the high<br />
difficulty category. In combat MOS, 12% experienced a high<br />
degree of difficulty: in non-combat MOS, 8%.<br />
Some of the results followed no particular pattern,<br />
suggesting the need for further investigation. The greatest<br />
disparity between the sexes was found among light wheel vehicle<br />
mechanics, where 26% of the females, but only 9% of the males,<br />
325
eported high difficulty. While a fair number of males (14%)<br />
experienced difficulty in the motor vehicle transport job, this<br />
was another case where the percentage of females experiencing<br />
difficulty was particularly high (24%). Both male (13%) and<br />
female (18%) food service specialists also placed large numbers<br />
in the high difficulty category.<br />
The pattern oi results for other items in the first factor<br />
generally followed the pattern for this item. However, MEPSCAT<br />
made a much greater difference with respect to a second item in<br />
this factor: How many times in the past six months have you been<br />
physically unable to lift an object while working on your Army<br />
job? The response options for this item were the same as those<br />
for the first item, and high difficulty and low difficulty were<br />
defined the same way as for the first item. On this item, 5.3%<br />
of those with high MEPSCAT scores reported high difficulty; 9.8%<br />
with low MEPSCAT scores so reported.<br />
Factor 2<br />
The item in Factor 2 chosen for close examination in this<br />
paper read as follows: How helpful do you think weight/strength<br />
training would be in improving your job performance? The<br />
response options for this item, and the proportion of respondents<br />
choosing each option, are shown below:<br />
Proportion<br />
Option Male Female Total<br />
1. Extremely helpful 31 18 30<br />
2. Helpful 30 26 29<br />
3. Somewhat helpful 18 20 18<br />
4. A little helpful 13 18 13<br />
5. Not at all helpful 8 18 10<br />
Again, for purposes of simplicity, responses were grouped<br />
into two categories. A "more helpfull' category consisted of<br />
options 1 and 2; a lVless helpful" category consisted of options<br />
3, 4, and 5. As can be seen above, 59% of the total group, 61%<br />
of the males, and 44% of the females, responded in the more<br />
helpful category.<br />
Among those able to lift 110 pounds, 61% were in the more<br />
helpful category, as opposed to 53% of those not able to lift 110<br />
pounds.<br />
A comparison across MOS revealed vast differences. Among<br />
cannon crewmembers, 73% were in the "more helpful" Category.<br />
Among administrative specialists, 25% were in this category. In<br />
the MOS with very heavy strength requirements, 64% thought<br />
weight/strength training would be helpful or extremely helpful.<br />
In the other MOS, the percentage was only 49%. In combat MOS,<br />
the percentage was 70%: in non-combat MOS, 53%.<br />
326<br />
. .
Discussion<br />
Certain characteristics of this effort suggest that we<br />
should treat these findings cautiously. We are dealing with<br />
self-report; thus, all limitations associated with self report<br />
measures must be considered. We have observed a positive<br />
relationship between self-reported difficulty in lifting an<br />
object and performance on a more objective measure, the MEPSCAT,<br />
however, so we do feel the results do deserve to be looked at<br />
seriously. We should also point out that we are dealing here<br />
with but two items on an Il-item scale. Until we can report more<br />
thoroughly the results of all the items, as well as the results<br />
from the supervisor version of the PRQ and from a variety of.<br />
additional performance measures administered concurrently with<br />
the PRQ, these results should be considered as just a slice from<br />
a much larger picture.<br />
Having expressed these caveats, what should we make of the<br />
results? The good news is that soldiers do not report widespread<br />
difficulties with the physical demands of their jobs. The<br />
somewhat surprising news is that the overall differences between<br />
self-reported male and female difficulty are not particularly<br />
great.<br />
But when we look beneath the surface, the picture is not ali<br />
that simple. There are major job differences, some not terribly<br />
surprising, some perhaps deserving further investigation. Why<br />
are the physical demands of being a mechanic, for example,<br />
apparently so much greater for females than for males? Why is<br />
there a similar disparity for truck drivers?<br />
The item on weight/strength training also revealed some<br />
interesting news. It is those people who are already strongest<br />
(in terms of their MEPSCAT scores) who are most convinced of the<br />
benefits of weight/strength training. Of course, since these<br />
individuals may be concentrated in jobs where the physical<br />
demands are the greatest, the true meaning of this finding awaits<br />
further analysis. While it may not be surprising that clerks see<br />
less need for strength training than do those in combat jobs, the<br />
extent to which clerks seem to view strength training as not<br />
particularly helpful is perhaps beyond what one might expect.<br />
The results reported here are best considered as a preview<br />
of things to come. Further analyses on a data set allowing a<br />
much broader set of-comparisons, and at a higher level of<br />
sophistication, than could be completed at this time will follow.<br />
Thus, we will forego the temptation to draw major conclusions<br />
until we have travelled somewhat further along the data analysis<br />
road.<br />
327
Response Distortion on the Adaptability Screening Profile (ASP)’<br />
Dale R. Palmer, Leonard A. White, and Mark C. Young<br />
U. S. Army Research Institute<br />
Alexandria, VA<br />
INTRODUCTION<br />
The Armed Services are considering the implementation of a biodata/temperament instrument,<br />
the Adaptability Screening Profile (ASP), to supplement education credentials as a predictor of first term<br />
attrition. A key problem in utilizing instruments like Ihe ASP, especially in the “en masse” screening<br />
medium of the Armed Services, concerns the potential for ttem response distortion of the self-report<br />
information, and consequently, invalidation of the instrument over time (Walker, 1985). Previous research<br />
on the Armed Services Applicant Profile (ASAP) and the Assessment of Background and Life<br />
Experiences (ABLE), both components of the ASP, indicates that these instruments are susceptible t0<br />
intentional distortion in the desired iirection of the examinee (Hough, 1987; Trent, Atwater, & Abraham%<br />
1986). Thus, it is possib!z that wid6,pread distortion could occur in a service applicant setting,<br />
particularly if ..lz\e,notlnc 5 f--Juraged. Guidelines may be written that “coach” applicants on how to<br />
do well on the test, ana recruiters, in order to meet quotas, might encourage or even train applicants to<br />
respond in a particular manner (Hanson, Hallam, & Hough, 1989).<br />
Prior to the research presented in this paper, we used a sample of 324 receptees to conduct a<br />
preliminary analysis of the effects of coaching on the ASP. With the assistance of military personnel, we<br />
developed a short script intended to represent “realistic” coaching that might be given to an applicant.<br />
The coaching taught examinees how to describe themselves in order to score well on the test. They<br />
were also warned that the instrument contained items to detect socially desirable responding and<br />
therefore not to answer in ways that could not possibly be true. As expected, we found that examinees<br />
can, when asked, distort their responses to the ASP in a socially desirable direction. Unexpectedly,<br />
however, the scores of examinees who were coached and warned about faking did not differ significantly<br />
from those who were responding honestly. One explanation for this result is that the warning effectively<br />
counteracted the coaching.<br />
The research reported here was designed to replicate and extend these findings. Specifically, to<br />
separate the effects of coaching and warnings about detection, one group received coaching on<br />
“correct” responding without being warned about possible detection and a second coached group was<br />
warned about faking detection. In addition, we examined the usefulness of the ABLE’s Validity scale to<br />
correctly detect those respondents who were instructed or coached to distort their responses in a<br />
socially desirable direction.<br />
Subiects<br />
METHOD<br />
Five-hundred and two male receptees were administered the ASP at the U.S. Army Reception<br />
Battalion, Ft. Sill, OK. ihe receptees were tested in eight groups of 14-105. Participants were informed<br />
that the purpose of the research was to learn how different test-taking strategies affect scores on the<br />
ASP.<br />
t Presenfed at the meeting of the <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong>, November, 1990. All statements expressed in this Paper are<br />
those of the authors and do not necessarily reflect the official opinions or policies of the U.S. Army Research Institute<br />
or the Department of the Army.<br />
328
Instruments<br />
-.~--..-~---.- --... .___ -- .__....<br />
The ASP is a combination of the ASAP and ABLE. The ASAP consists of 50 multiple choice<br />
items which are combined to yield an overall score. Responses to each item are scored l-3. with<br />
scoring weights to best predict attrition during the first term of enlistment. The ABLE is a 70-item,<br />
construct-based temperament scale comprised of three subscales to measure Achievement, Adjustment,<br />
and Dependability. These three subscale scores are combined with unit weights to form an overall ABLE<br />
composite. A fourth, the ABLE Validity scale, is used to detect inaccuracy in examinees’ responses<br />
caused by attempts to respond in a socially desirable manner.<br />
Procedure<br />
The design was a 4 x 2 between-subjects factorial with four l&els of instructional condition and<br />
two orders of test administration. One-half of the subjects within each session completed the ABLE prior<br />
to the ASAP and one-half first took the ASAP followed by the ABLE. The four instructional conditions<br />
were as follows:<br />
The Honest. instructions followed those developed for the proposed operational ASP.<br />
Participants were instructed to “pick the response that best describes your attitudes or past<br />
experiences.’<br />
Fake Good. Subjects in this condition were told to “se!ect the answer that describes yourself in<br />
a way that you think will make sure that the Army selects you....Your response should be the choice that<br />
you think would impress the Army the most.”<br />
Coached-With Warning. The instructions in this condition were designed to represent coaching<br />
strategies that might be used to help applicants for the Armed Services score well. Subjects were told<br />
to, “select the answer that describes yourself in a way that you think will make sure that Ihe Army selects<br />
you...to make a good impression,.,answer so that you look mature, responsible, well-adjusted, hardworking,<br />
and easy to get along with.” In addition, subjects were told to “be aware that there are<br />
questions designed lo detect if you are trying to make yourself look too good. So, answer in a way that<br />
makes you look good, but try to avoid answering any of the questions in a way that cannot possibly be<br />
true.”<br />
Coached-Without Warninq. Subjects in this condition received the same coaching instruction as<br />
those in the coached-with warning group, except that no warning about items to detect faking was<br />
provided.<br />
Descriptive Statistics<br />
RESULTS<br />
Table 1 presents the means and standard deviations of the six ASP subscales and composites in<br />
the four instructional conditions. Overall, mean ASP scores were highest for examinees who were<br />
coached on the “correct” responses or instructed to fake good. Note, the mean ASP scores for<br />
respondents who were warned about possible detection of faking were most similar to scores in the<br />
honest condition.<br />
Effect of Test Order and Instructional Condition<br />
Six 4 x 2 ANOVAs were used to examine the effects of instructional condition (4 levels), test<br />
order (2 levels), and their interaction on the dependent variables. The main effect of instructional<br />
condition was highly significant bc.001) for all ASP scales. The highest E value was obtained for the<br />
329<br />
.
Table 1<br />
Effect of Instructional Conditions on ASP Scales<br />
scale<br />
Instructional Condition<br />
COACHED- COACHED-<br />
HONEST FAKE GOOD WITH WARNING NO WARNING<br />
(n=126) (n=148) (n-109) (n=lOO)<br />
-<br />
ASAP Total 114.58 (10.67) 120.50 (9.35) 117.59 (12.06) 120.80 (9.86)<br />
ABLE Total 142.01 (15.18) 158.33 (14.44) 144.88 (14.54) lSS.51 (15.87)<br />
Achievement 54.90 (7.72) 62.13 (6.53) 56.07 (6.30) 60.13 (7.13)<br />
Adjustment 33.06 (4.50) 36.51 (4.13) 33.36 (4.64) 35.18 ’ (4.98)<br />
Dependability 54.02 (5.49) 59.61 (5.39) 54.50 (6.22) 58.99 (5.63)<br />
Fake Validity 15.99 (3.27) 21.40 (5.50) 16.24 (3.75) 20.74 (4.77)<br />
,&&. The maximum sample sizes are reported. Sample sizes vary slightly across outcome<br />
measures. Standard deviations are presented in parentheses.<br />
ABLE Validity scale, E(3, 475) = 50.75, ec.001. As shown in Table 1, the honest and coached-with<br />
warning groups had comparable means on the Validity scale, with J$ = 15.99 and u, = 16.24,<br />
respectively. By comparison, the means on this scale were about one standard devration higher in the<br />
fake good @ = 21.40) and coached @ = 20.74) groups. None of the main effects of test order or the<br />
treatment group by test order interaction was significant (all ~s.05)<br />
Effect Sizes for Instructional Group Comparisons<br />
Effect sizes for the 6 possible combinations of instructional comparisons for all ASP scales and<br />
composites are reported in Table 2. Scheffe test significance levels for each comparison are also<br />
shown.<br />
Table 2<br />
E f f e c t<br />
Honest v. Honest v. Honest v. Fake Good v. Fake Good v. Coached v.<br />
Scale Fake Good Coached-W Coached-NW Coached-W Coached-NW Coached-M<br />
ASAP Total -4.562 -0.26 -0.58* +0.27 -0.03 -0.26<br />
ABLE Total -0.96* -0.19 -0.00* +o.a4* +0.19 -0.73*<br />
Achievement -0.90* -0.16 -0.67* +0.05* +0.30 -0.64f<br />
Adjustment -0.74* -0.06 -0.47* +0.60* +0.31 -0.39<br />
Dependability -0.91* -0.08 -0.90* +0.81* +0.11 -0.72*<br />
Fake Validity -1.01* -0.07 -1.4s* +0.94* +0.12 -1.20'<br />
Note. Coached-W = Coached with a warning about fake detection items in the test.<br />
Coached-NW = Coachedwithout a warning about fake detection items in the test.<br />
'The difference in group means divided by the pooled group standard deviation.<br />
� p
Overall, the results replicate the findings from our previous research. As in the earlier experiment,<br />
the scores of soldiers given the “fake good” instructions were significantly higher than soldiers’ in the<br />
honest condition. This shows that the “fake good’ instructions were effective in producing positive<br />
response distortion.<br />
Also, scores resulting from the honest and coached-with warning conditions were not significantly<br />
different from each other. Thus, response distortion on the ASP was reduced (but not necessarily<br />
eliminated) in the group given the ‘coached-with warning” instructions. The combination of the warning<br />
of fake detection items and instructions not to appear “too perfect” may be responsible for this<br />
suppression of positive response distortion.<br />
In our extension of the research, we also examined the effect of coaching when no warning about<br />
fake detection items is given. As shown in Table 2, soldiers in this condition had significantly higher<br />
scores than those given the “honest” instructions. However, the scores of those in the coached-without<br />
warning group did not differ significantly from those in the fake good condition. Thus, the aeneral<br />
(faking) strategy (i.e., describing oneself in a way that insures being selected by the Army) and the more<br />
soecific (coached) strategy (trying to present oneself as mature, responsible, well-adjusted, hardworking,<br />
well organized, and easy to get along with) were equally effective in producing response<br />
distortion. Finally, in comparison with the “coached” instructions, the addition of a warning about faking<br />
detection items resulted in significantly lower scores on 4 out of the 6 scales. This demonstrates that the<br />
warning was at least partially effective in reducing response distortion.<br />
Grouo Differences in Correlations Amona ASP Scale Scores<br />
Correlations of the Validity scale with the other ASP scales were examined in each of the four<br />
conditions. As expected, the lowest correlations with the Validity scale were found when examinees<br />
were responding honestly & = .20 to .37, all p-z.05). The highest correlations with the Validity scale<br />
were found when subjects were coached or told to fake in the socially desirable direction (r = .30 to .71,<br />
all g< .05). The correlations with the Validity scale within the coached-with warning group @ = .11 to<br />
SO) were generally higher than the correlations found within the honest group, but smaller than the<br />
correlations for the two other groups. This indicates that the coached-wi?h warning group distorted their<br />
responses in a positive direction, but not as much as the faking or coached groups.<br />
Utilitv of the Validitv Scale<br />
for Detectina Resoonse Distortion<br />
The purpose of the ABLE Validity scale is to identify individuals who have distorted their responses in<br />
a socially desirable direction. We examined how effective this scale would be in correctly classifying<br />
persons who were coached or instructed to distort their ASP responses.<br />
Table 3 shows how well the Validity scale discriminates among the groups, for each possible cut<br />
score that might be used to classify distorted responses. For example, with a cut score of 27, no one in<br />
the honest group would be incorrectly classified as faking (i.e., deliberately distorting responses in a<br />
socially desirable direction). However, this cut score would correctly classify 22% of those given the<br />
fake good instructions, 15% of those coached, and 3% of those coached-with warnings. Thus, all<br />
individuals in the fake good or coached conditions who were at or above the cut score would be<br />
correctly classified as fakers. Moreover, this would be done without misclassifying anyone in the honest<br />
group (since no one in this group had a Validity score above 26). The results also show that response<br />
distonion among those given the coached-with warning instructions is most difficult to detect. This is<br />
consistent with the finding that Validity scores between the honest and coached-with warning groups do<br />
not differ significantly.
Table 3<br />
Detection of Response Distortion Among Instructional Groups<br />
Dsino the validitv Scale<br />
Percent Percent Percent<br />
Response validity False Alarms Percent of Coached-W Coached-NW<br />
Scale Cut score (in the Honest All Fakers Respondents Respondents<br />
(at or above) sample) Detected Detected Detectet<br />
(n=233) (n=213\ In=2481 (n-100)<br />
11<br />
12<br />
13<br />
14<br />
15<br />
16<br />
17<br />
18<br />
19<br />
20<br />
21<br />
22<br />
23<br />
24<br />
25<br />
26<br />
27<br />
28<br />
29<br />
30<br />
31<br />
32<br />
33<br />
100.0<br />
97.4<br />
91.8<br />
75.5<br />
62.7<br />
49.4<br />
40.3<br />
28.3<br />
19.7<br />
13.7<br />
9.9<br />
6.C<br />
3.0<br />
’ “7<br />
1.3<br />
0.4<br />
0.0<br />
0.0<br />
0.0<br />
0.0<br />
0.0<br />
0.0<br />
0.0<br />
100.0<br />
99.1<br />
97.2<br />
100.0<br />
93.1<br />
89.5<br />
93.0 77.4<br />
89.7 66.9<br />
85.0 53.6<br />
79.4<br />
46.3<br />
74.2 34.6<br />
66.7 23.7<br />
59.2 16.0<br />
51.6 11.6<br />
43.2 9.6<br />
37.1 6.8<br />
29.6<br />
4.8<br />
27.7 4.0<br />
23.5 2.8<br />
21.6 2.8<br />
16.4 2.8<br />
13.1 2.0<br />
10.3 0.8<br />
8.9 0.4<br />
1.9 0.0<br />
1.4 0.0<br />
100.0<br />
100.0<br />
99.0<br />
97.0<br />
93.0<br />
86.0<br />
81.0<br />
72.0 . .<br />
65.0<br />
55.0<br />
46.0<br />
39.0<br />
3 2 . 0 .<br />
24.0<br />
22.0<br />
19.0<br />
15.0<br />
8.0<br />
7.0<br />
5.0<br />
5.0<br />
2.0<br />
2.0<br />
Note. Coached-W = Coached w a warning about fake detection items in the test.<br />
Coached-NW = Coached without a warning about fake detection items in the test.<br />
Except as noted, samples were aggregated from two experiments.<br />
'Sample was obtained from the current experiment only.<br />
DISCUSSION<br />
The results in this paper serve to corroborate earlier findings by Hanson et al. (1989), while<br />
adding new information on the effects of coaching. First, we found that the inclusion of warning<br />
statements about lie detection items seems to suppress response distortion to almost honest condition<br />
levels. The use of warning statements may be helpful to deter intentional distortion in future<br />
administrations of the ASP. Secondly, the Validity scale was found to be reasonably effective in<br />
detecting response distortion. High Validity scale scores were shown to correctly identify a substantial<br />
percentage of fakers, without misclassifying honest respondents.<br />
In addition to these findings, our results suggest that coaching instructions designed to simulate<br />
“real-life” coaching by recruiters may be no more effective in eliciting response distortion than general<br />
instructions to “fake good”. Outside guidance on distorting ASP responses may serve to motivate<br />
applicants to fake. However, it is questionable as to whether such guidance would make a significant<br />
difference in the ASP scores of applicants who would otherwise be motivated to dissemble.<br />
Finally, future research will examine how positive response distortion affects the validity of the ASP<br />
for predicting attrition. We pfan to investigate the feasibility of using the Validity scale to adjust ASP<br />
scores for faking. Such an adjustment might enhance the validity of the ASP in predicting attrition, as<br />
well as other important Army criteria.<br />
332
REFERENCES<br />
Hanson, M.A., Hallam, G.L., & Hough, L.M. (1969, November). Detection of resoonse distortion in the<br />
AdaDtabilitv Screenino Profile [ASPA. Paper presented at the 31st Annual Conference of the<br />
<strong>Military</strong> <strong>Testing</strong> <strong>Association</strong>, San Antonio, TX.<br />
Hough, L.M. (1987, August). Overcomina obiections to use of temDerament variables in selection:<br />
Demonstrating their usefulness. Paper presented at the American Psychological <strong>Association</strong><br />
Convention, New York, NY.<br />
Trent. T., Atwater, D.C., & Abraham% N.M. (1986, April). Experimental assessment of item response<br />
distortion. In Proceedinas of the Tenth Psvcholoav in the DOD Svmposium. Colorado Springs,<br />
CO: U.S. Air Force Academy.<br />
.<br />
Walker, C.B. (1985). The fakabilitv of the Armv’s Militarv Aoolicant Profile fMAP1. Paper presented at the<br />
<strong>Association</strong> of Human Resources Management and Organizational Behavior proceedings,<br />
Denver, CO.<br />
333<br />
- ---<br />
.
PSYCHOMETRIC PROPERTIES OF A NUMBER COMPARISON TASK:<br />
MEDIUM AND FORMAT EFFECTS<br />
Banderet, L.E., Shukitt-Hale, B.L., Lieberman, H.R., Simpson’,<br />
LTC R-L., Perez’, CPT P.J., U.S. Army Research Institute of<br />
Environmental Medicine, Natick, MA, and ’ TEXCOM Armor and<br />
Engineer Board, Advanced Technology Research Div., Fort Knox, KY.<br />
ABSTRACT<br />
Researchers adapting or developing performance tasks for<br />
administration on personal computers are confronted with choices<br />
that may affect the task’s measurement properties. To eva I uate<br />
the effects of test medium, subjects completed Number Comparison<br />
(NC) tasks administered with both paper-and-pencil and portable<br />
computer media. Computer i zed NC proved super i or to<br />
paper-and-pent i I NC ; Lhe automated version had greater completion<br />
rates, ’ r-’ ‘Lbilities, and sensitivity to environmental<br />
stressors”;~ypoxia and cold).<br />
In a second study investigating task format, subjects were<br />
tested with a computerized NC task which presented either 1 or 33<br />
problems in each display window. Although the results were<br />
similar for these two formats, the response rates for the two<br />
formats were dependent upon the number of administrations. On<br />
some of the later administrations, rates for the multiple-problem<br />
format were 10% greater. Thus, formal evaluation during<br />
adaptation or development of computerized performance tasks helps<br />
ensure evolving tasks will possess reliability, sensitivity, and<br />
other useful psychometric properties.<br />
fNTRODUCTlON<br />
<strong>Testing</strong> performance capabilities with tasks automated by<br />
computers is more feasible today than ever before since computers<br />
possess better displays, process information faster, execute<br />
larger programs and data bases, store more information, and cost<br />
less. When a performance task is adapted or deve I oped for<br />
administration by computer, the subject’s output responses and<br />
the instrument’s psychometric properties may change (Banderet et<br />
al., 1989; Moreland, 1987).<br />
We evaluated the automation of a performance task in two<br />
studies.- In the first, an automated Number Comparison task (C-NC)<br />
was compared to its paper-and-pencil equivalent (P-NC). In the<br />
second study, the format of the display on the automated Version<br />
Was evaluated. Displays with a single problem were compared with<br />
displays with 33 problems. This report will describe the effects<br />
Of task medium and format upon the psychometric properties Of an<br />
automated NC task.<br />
334
Sub jects---Twen ty medical research volunteers from Fort Detrick,<br />
MD, and Natick. MA, were subjects for study 1. Thirty two Ml-Al<br />
a&Or personnei fr&n Ft. Knox, KY, participated in study 2. Al I<br />
soldlers participated in these studies after they were given<br />
physicals and were fully informed about the conditions and procedures<br />
of the study. Investigators adhered to AR 70-25 and<br />
USAMRDC Regulation 70-25 on Use of Volunteers in Research.<br />
Assessment Instruments---The Number Compar i son Task involves<br />
evaluating pairs of numbers to determine if the two numbers in<br />
each problem are the same or different. In the first study,<br />
automated and paper-and-pencil versions of the NC task ,wtre .<br />
studied. The paper-and-pencil task (P-NC) was generated by<br />
computer and printed on a laser copier. The automated Number<br />
Compar i son (C-NC) task was administered on a GRiD Compass<br />
portable computer. A subject’s response could not be changed<br />
after it was entered on the keyboard of the automated task.<br />
These assessment measures and experimental data are described<br />
elsewhere (Shukitt et al., 1988).<br />
METHOD<br />
In the second study, two formats of the automated NC task<br />
were studied. During testing, a display on a subject’s computer<br />
showed either 1 or 33 problems to be evaluated. The later format<br />
Was similar to the format used in study 1 on both versions of NC.<br />
Procedures ---Both studies reported in this paper were repeatedmeasures<br />
L designs and were incorporated into larger investigatlOnS<br />
with other objectives. The first was to determine if an<br />
amino acid, tyrosine, prevents some of the adverse behavioral<br />
effects i nduced by environmental stressors. Specifically, 20<br />
subjects were exposed to 4700 m of simulated high altitude and<br />
17OC for 7 h; two other occasions they were exposed to 550 m and<br />
22OC (baseline). The automated and paper-and-pencil versions of<br />
the NC task were administered 300-320 minutes after ascent with<br />
10 min separating their respective administrations.. Initially,<br />
subjects practiced the NC task 15 times and learned to perform<br />
quickly with
sensitivity to experimental effects, a z score was calculated<br />
since it reflected both the magnitude and variability of measured<br />
effects.<br />
RESULTS<br />
At 550 m + 22OC, performance rates were greatest for the<br />
automated NC task since P-NC task was 87% of C-NC (See Table I).<br />
Task definikiuil for the C-NC task was also better than its ImnUal<br />
counterpart. The reliability of administrations during the<br />
experimental and control conditions was greatest for the C-NC<br />
task. The C-NC version of the NC task was also more sensitive to<br />
altitude effects than the manual version of this performance task<br />
as inferred from z score magnitudes.<br />
TAB’-’ . ; i Prhrzrties of the paper-and-pencil (P-NC) and<br />
automatea (C-NC) versions of the Number Comparison Task<br />
at 550 m + 22OC and 4700 m + 15OC.<br />
CRITERION<br />
Baseline Rates<br />
(correct/min)<br />
Minutes Practice Required<br />
(mln/admin)<br />
Task Definition<br />
(admin 5 and 6)<br />
i<br />
STATISTIC I P-NC : C-NC<br />
8<br />
BASELINE VALUES (550 m + 22OC)<br />
Mean 25.13 28.78<br />
Sigma 8.02 8.88<br />
Mean 20 20<br />
Pearson’s r .89 .94<br />
Reliability Pearson’s r -81 . 91<br />
(550 m vs. 4700 m)<br />
ALTITUDE + COLD EFFECTS (4700 m + 17OC)<br />
Altitude Effect Mean - 5.04 - 5.26<br />
(change In correct/min) Sigma 5.08 4.07<br />
z score - -99 - 1.29<br />
336<br />
.
In the second study, the response rates for the multiple<br />
prob I em format interacted with administrations (i.e., days).<br />
Rates for the multiple-problem format were always greater than<br />
rates for the single-problem format; days 3 and 4 of the field<br />
test they were approximately 10% greater (See Fig. 1). Practice<br />
requirements, task definition, and task sensitivities were<br />
comparable for the two different display formats. The I arger<br />
response rates for the multiple-problem format are noteworthy<br />
since faster rates are usually associated with tasks that have<br />
superior psychometric propertles.<br />
C 0<br />
r<br />
e<br />
c t<br />
P er<br />
m in<br />
DAYS<br />
.-Q.. 33 Problems -+-- 1 Problem<br />
Fig. 1: Response rates for an automated Number<br />
Compar i son Task for a l-problem or a 33-prob I em<br />
display. Each task was practiced four times previously<br />
(5 min per administration).<br />
I3 I SCUSS I ON<br />
Automated NC was super ior to its paper-and-pencil<br />
counterpart. The response rates, sensitivity to environmental<br />
stressors, and test-retest reliabilities were greater for the<br />
automated version than for the paper-and-pencil version. This<br />
demonstrates that when performance tasks are automated, modified,<br />
or developed they may have different psychometric properties than<br />
thelr traditional counterparts. In this evaluation, the<br />
automated verslon of the NC task possessed the best psychometr ic<br />
properties.<br />
Secondly, It is important that performance tasks be<br />
evaluated during their adaptation or development. The success of<br />
performance tasks is usually dependent upon their psychometric<br />
properties (e.g. sensitivity. requirements for pract:;:; TtV$<br />
test-retest reliabilities). Evaluation wi 11 ensure<br />
psychometric characteristics of the task can be optimized and<br />
that appropriate measures will be retained and used.<br />
337
---__- - ---.. _<br />
REFERENCES<br />
Banderet. L.E., Shukitt, B.L., Walthers, M-A.. Kennedy, R.S.,<br />
Bittner, A.C.. Jr., & Kay, G.G. (1989). Psychometric properties<br />
of three addition tasks with different response requirements.<br />
Proceedings 30th Annual Meeting <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong> (PP.<br />
440-445) . Arlington, VA: U.S. Army Research Institute for the<br />
Behavioral and Social Sciences.<br />
Moreland, K.L. (1987). Computerized psychological assessment:<br />
What’s available. In J.N. Butcher ted. 1 Computerized<br />
Psychological Assessment. New York: Basic Books, pp. 28-49.<br />
.<br />
Shukitt, B.. Burse, R.L., Banderet, L., Knight, D.R.. & Cymerman,<br />
A. (1988). Cognitive performance, mood states, and altitude<br />
symptomatology in 13-21% oxygen environments (Tech. Rep. No.<br />
18/88). Natick, MA: U.S. Army Research institute Environmental<br />
Medicine.<br />
338
SUBJECTIVE STATES QUESTIONNAIRE: PERCEIVED WELL-BEING<br />
AND FUNCTIONAL CAPACITY<br />
Banderet, L.E., O’Mara, M., Pimental’, N.A., Ri ley, SGT R.H.,<br />
Dauphinee, SSG D.T., Witt, SSG C.E., Toyota, SGT R.M., U.S. Army<br />
Research Institute of Environmental Medicine and ‘Navy Clothing<br />
and Textile Research Facility, Natick. MA.<br />
ABSTRACT<br />
Self-rated measures of symptoms and moods are especially<br />
Sensitive to stressors and often detect changes in well-being .<br />
before more object i ve indices (Beck, 1979). We dev? I oped a<br />
40-item Subjective States Questionnaire (SSQ) to exploit such<br />
measurement properties in our research program for determining<br />
the effects of extreme environments and evaluating treatment<br />
strategies. The SSQ assesses a greater range of reactions than<br />
most symptom or mood sea les and seeks est imates of a soldier ‘s<br />
capac i ty to perform common soldier tasks and other fami I iar<br />
activities or the effort required to complete them.<br />
In a laboratory Study of heat stress, SSQ data were collected<br />
during six, 135-minute test sessions. Nine soldiers gave<br />
verbal ratings of “how they felt at that moment” during selected<br />
exercise, rest, and recovery intervals. Many subjective states<br />
appear sensitive to these manipulations. Ratings of most capabilities<br />
return rapidly to normal after termination of exercise<br />
and heat exposure.<br />
INTRODUCTION<br />
Self-rated measures of symptoms, moods, and behavioral capabilities<br />
are often more sensitive than objective measures of psychological<br />
phenomena (Beck, 1979). The sensitivity of self-rated<br />
measures probably results because many phenomena can be assessed<br />
with self-rated instruments, human subjects can recall and integrate<br />
personal experiences over time, and sensory and perceptual<br />
systems are most respons i ve to changes in stimulation or<br />
activity.<br />
To exploit the advantages of self-rated measures in our ongoing<br />
research with soldiers exposed to environmental stressors,<br />
we deve I oped The Subjective States Questionnaire (SSQ). This<br />
questionnaire assesses perceived capability or the effort to<br />
complete a task by having the mi I i tary subject relate such constructs<br />
to common soldier tasks or other fami I iar activities.<br />
This paper describes preliminary findings with the SSQ from an<br />
experiment where military subjects were tested experimentally in<br />
a hot physical environment whi le wearing various uniform ensembles.<br />
METHOD<br />
Subjects--- Nine physically fit males (average Statistics: age,<br />
23 years; height, 69 in; weight, 165 lbs) volunteered for the<br />
339
test after they were fully informed about the conditions and<br />
procedures of the study (P imenta I , Avellini, & Banderet, in<br />
progress).<br />
Assessment Instruments---The SSQ is a 40-item, self-rated instrument<br />
(see Table I). It assesses perceived cognitive, memory,<br />
affective, sensory-perceptual, psychomotor, verba I, and kinesthetic<br />
capabilities. Many items operationally define estimates of<br />
such capabilities by relating them to selected common soldier<br />
tasks (HQ, Dept. .?rmy, 1987). For example, “I would have trouble<br />
running 2 m.i!zs in anything near my normal time” Some items are<br />
defined by relating them to familiar activities; e.g., Item.22 “I<br />
could remember spoken directions to a store a few miles from<br />
here. ” Twenty of the items in the SSQ are positive; e.g., “I<br />
could properly camouflage myself and my equipment.” The other<br />
twenty items are negative; e.g,, “If I were to drive an automobi<br />
le, I might commit traffic violations or cause accidents.”<br />
Each i tern is rated on a 6-point scale with discrete anchor<br />
points; i .e., “Not At al I , ‘I “Sl ight,” “Somewhat, ” “Moderate, ”<br />
“Quite a Bit,” and “Evtremely.” The SSQ can be administered as a<br />
mark-sent- ;LGtitionOait-e, as an automated questionnaire on a computer,<br />
or as a,\ oral survey.<br />
To simplify description and display of data from individual<br />
items of the SSQ, a I I rat ings for negat ive i terns are recoded and<br />
their verbal descriptions are restated positively. These transformations<br />
change each negat i ve item so it assesses a<br />
“capabi I i ty” and greater ratings reflect greater capability. For<br />
example, item 29 is “I would probably miss some information in<br />
military radio messages, without some “say agains”. During data<br />
analysis, this i tern’s ratings are recoded and it is restated as<br />
” I cou Id probably comprehend most information in radio messages,<br />
w l thout some “say agains”. After such transformations, all 40<br />
i terns of the SSQ assess “capabi I it ies” and larger ratings imply<br />
greater capability.<br />
Procedures ---Subjects exercised for 2-hours per day in a hot<br />
environment for 6 days before manipulation of experimental conditions.<br />
This avoided confounding the effects of physiological<br />
heat acclimation with heat strain induced by the experimental<br />
conditions. Acclimatizing conditions were 95OF dry bulb and 88OF<br />
Wet bulb (75% relative humidity) with a 2.0 mph wind.<br />
Then, the men were tested in a repeated-measures design to<br />
evaluate six configurations of a Navy firefighting ensemble with<br />
different heat-retaining properties (Pimental, Avellini, &<br />
Banderet, in progress). Each test day, each man wore a new<br />
configuration of the firefighting ensemble (randomly-assigned).<br />
Environmental conditions during each 2 hour experimental SeSSiOn<br />
were 9OoF dry bulb, 79OF wet bulb (60% relative humidity) with a<br />
2 mph wind. Subjects alternately sat for 15 minutes (metabolic<br />
rate 105 watts) or walked at 3.5 mph (500 watts) on a level<br />
treadmill. The time-weighted metabolic rate was approximately 300<br />
watts.<br />
The SSQ was administered 5, 20, 95, 110, and 125 min after<br />
the start of each experimental session. Each administration began<br />
the fifth minute of a scheduled resting, walking, resting, walki<br />
ng, or recovery i nterva I ; respectively. During each assessment,<br />
340
each item on the SSQ was read to the group by a medical NCO. Each<br />
subject’s ratings were sensed by a lapel microphone and recorded<br />
on a separate audio channel for subsequent data encoding and<br />
analysis. The last (fifth) administration was immediately after<br />
an experimental session. Subjects removed their un i forms and<br />
monitoring equipment and were tested in a room (at normal ambient<br />
temperature) 5 min after they finished walking on the treadmill.<br />
All data were analyzed with SPSS/PC+, V3.0. Results were<br />
significant if ~~0.05 (l-tailed). Data were frequently missing<br />
during a daily test session and often involved different subjects<br />
from administration to administration of an item. Paired T-tests<br />
were used in evaluating uniform ensembles rather than more traditional<br />
repeated-measures, analysis of variance statistics, since . .<br />
this statistic was not affected by missing values which occurred<br />
during another administration in a session,<br />
RESULTS<br />
These data demonstrate responsiveness of the SSQ to heat<br />
strain induced by metabolic heat production from exercise and<br />
high environmental temperatures. These data reflect average sub-<br />
ject responses under conditions of increasing heat storage<br />
induced by walk-rest activities.<br />
Activity-time changes on individual SSQ items dluring<br />
experi-<br />
mental sessions suggested three trends: decreased capabilities<br />
during the session with rapid recovery afterwards (Fig. 1).<br />
decreased capabilities, without rapid recovery (Fig. 2), and no<br />
apparent changes for some capabilities (Fig. 3). Each bar in Fig.<br />
l-3 has a “+I’ symbol above i t ; the hor i zontal bar on each symbol<br />
is the standard error of the mean for that data point. This<br />
report only shows illustrative data because of space limitations.<br />
Analysis of these data for other purposes required use of<br />
multiple comparisons to evaluate various configurations of the<br />
firefighting ensemble for different activities-times during the<br />
session. Table II shows i terns which appear most frequently<br />
affected by these conditions since these i terns were more often<br />
statistically significant for these comparisons.<br />
On some items, perceived capabilities are least during exercise,<br />
e.g. Item 24: “I would have trouble running 2 miles in<br />
anything near my normal t ime”. On other i terns, perceived capabilities<br />
are least during rest following exercise, e.g. Item 35:<br />
“I feel as good as I usually feel”. Most capabi I ities recover<br />
rapidly following exercise and heat exposure since values obtained<br />
5-10 min after the end of the experimental challenge are<br />
simi lar to base1 ine values. A few capabilities recover more<br />
slowly since they are still<br />
tion.<br />
impaired during the last administra-<br />
Missing data were evident for al I conditions but were more<br />
frequent during exercise or when subjects were approaching medical<br />
safety limits or feeling i I I. Although some data were lost<br />
because of equipment and procedural shortcomings, most missing<br />
data were caused by failures of the subjects to respond when they<br />
were uncomfortable or preoccupied with other activities.<br />
341
TABLE I: Actual Items on The Subjective States Questionnaire.<br />
l feel “overwhelmed.”<br />
I feel “vulnerable.”<br />
Right now, I could answer most promotion board questions.<br />
It would be more difficult than usual to understand new concepts that are being taught in a military class.<br />
My thinking and other mental processes are at their “max.”<br />
It would require more effort than usual to tell someone how to “shoot an azimuth.”<br />
My vision seems especially sharp and clear.<br />
My thoughts seem complete.<br />
I feel like spit shining my boots and polishing my brass.<br />
My body feels clumsy and awkward in this situation.<br />
I could complete gas mask conftience training, including unmasking in the “gas chamber” with no difficulty.<br />
It would take more effort than usual to complete a land navigation course.<br />
.<br />
I feel “out of touch” with my surroundings.<br />
I feel confused.<br />
I could easily play a difficult video game for 20-25 minutes.<br />
My thinking seems “sluggish.”<br />
I am having trouble remembering some things now.<br />
Staying in this study hardly ser IIS worth it.<br />
Sending a grid coordinate by radi , would require greater effort than usual.<br />
If I were +s,~-; _ .a Iotor vehi .le, my actions would seem “jerky” and “unconnected.”<br />
I could p;operiy cc L!sge myself and my equipment.<br />
I could remember spoken directions to a store a few miles from here.<br />
If I were driving an automobile, I might commit trafftc violations or cause accidents.<br />
I would have trouble running 2 miles in anything near my normal time.<br />
I can talk freely without stuttering.<br />
A 2-3 hour G.I. party might be difficult to “deal with.“<br />
Telling even a short joke would require more effort than usual.<br />
If a “password” and “challenge” were changed every two hours, it might be difficult for me to remember them.<br />
I would probably miss some Information in military radio messages, without some “say agans.<br />
I feel disoriented.<br />
I am as aware of feelings in my arms, legs, and body as I usually am.<br />
It would be hard to be up for 24 hours of guard duty now.<br />
I could disassemble and reassemble an M-16 correctly within time limits.<br />
Detecting a soldier in BDUs in tall brush would take more effort than it usually does.<br />
I feel as good as I usually feel.<br />
I would confuse some of the azimuths with the directions they represent.<br />
I feel “ate up.”<br />
My memory is working as well as it usually does.<br />
I feel good enough to max at least one part of the PT test.<br />
I would find it more difficult than usual to find a landmark such as railroad tracks on a map.<br />
4.5<br />
C<br />
A I<br />
1 3.5<br />
6<br />
I<br />
: 2.5<br />
b<br />
1.5<br />
5 20 95 110 , 5<br />
TIME (min) , POST<br />
BASELINE WALK 1 REST 3 WALK 4 ; EXPOSURE<br />
FIG. 1: SSQ item 3,, “Right Now, I Could Answer Most Promotion Board Questions, ‘I shows decreased capability with<br />
increasing heat strarn with partial recovery following termination of heat exposure and exercise.<br />
342<br />
4.5<br />
HI<br />
3.5<br />
2.5<br />
1LO<br />
1.5<br />
.
I4<br />
BI<br />
k3<br />
;<br />
2<br />
BASELINE WALK 1<br />
FIG. 2: Transformed SSQ item 1, “I Feel In Control (w Overwhelmed),” shows decreased capability with increasing<br />
heat strain with little, if any, recovery following termination of heat exposure and exercise.<br />
C<br />
A<br />
5<br />
I4<br />
B I<br />
i3<br />
5<br />
2 L<br />
TIME hid<br />
BASELINE WALK 1 REST 3<br />
I<br />
1 POST<br />
WALK 4 : EXPOSURE<br />
FIG. 3: Transformed SSQ item 36, “I Could Associate Azimuths With The Directions That They Represent,” shows little,<br />
if any, effects upon capability with increasing heat strain or termination of heat exposure and exercise.<br />
TABLE II: Actual Items from The Subjective States Questionnaire that yielded frequent statistically significant differences<br />
on comparisons of firefighting ensembles.<br />
l<br />
1, I feel “overwhelmed.”<br />
2. I feel “vulnerable.”<br />
3. Right now, I could answer most promotion board questions.<br />
10.<br />
12.<br />
17.<br />
18.<br />
My bod feels clums and awkward in this situation.<br />
It woul d’ take more exoft than usus; to complete a land navigation course.<br />
I am having trouble remembering some things now.<br />
Staying in this study hardly seems worth it.<br />
23.<br />
24.<br />
If I were driving an automobile, I might commit traffic violations or cause accidents.<br />
would have trouble running 2 miles in anything near my normal time.<br />
30. I feel disoriented.<br />
37. I feel “ate up.”<br />
343<br />
5<br />
HI<br />
4<br />
3<br />
LO<br />
2<br />
. .
--<br />
Acceptability of the questionnaire to our military test subjects<br />
was better than most symptom, mood, or personal ity questionnaires<br />
that we have administered before. This observation was<br />
supported by discussions about some items and comments suggesting<br />
the items were relevant to a soldier’s training and experiences.<br />
Subjects volunteered that items also made them think about the<br />
imp1 ications o f performing mi I i tary tasks in stressful situations:<br />
DISCUSSION<br />
This study explored the perceived capabilities of soldiers<br />
to perform common soldier tasks and other fami I iar activi,ties<br />
under varying degrees of heat strain. Varied human capabilities<br />
were affected by heat exposure and exercise. Interestingly, most<br />
items showed recovery even 5 min after termination of heat exposure<br />
and exercise. These preliminary results suggest that the<br />
SSQ may be usefu I in other situations which use military personnel<br />
as test subjects. The content of i terns fosters interest and<br />
cooperation, useful assets especially in challenging testing<br />
situations.<br />
Surveying a group of subjects oral ly, as was done in the<br />
present study, is advantageous in some exper imenta I situations,<br />
particularly when subjects are exercising or performing a task.<br />
To minimize missing data when the SSQ is administered Orally,<br />
Special emphasis must be given since there are fewer sanctions to<br />
encourage responding on each i tern than with other forms of questionnaire<br />
administration.<br />
These data demonstrate that subjects can provide systematic<br />
estimates of their perceived capabilities for varied tasks. Although<br />
this study did not validate subject estimates of their<br />
capabilities, the time courses of soldier capabilities appears<br />
plausible. Furthermore, the recovery of some capabilities in 5<br />
min or less emphasizes the limitations of using a “post” session<br />
measure to approximate “status” during an earlier stressful challenge.<br />
Th is observation a I so illustrates the importance of<br />
Sampl ing at appropriate times so that the time course of a phenomena<br />
can be accurately measured.<br />
REFERENCES<br />
Beck, A.T. Cognitive therapy for depression. New York: Gui I ford<br />
Press, 1979.<br />
Headquarters, Department of Army. Soldier’s manual of common<br />
tasks (skill level l), STP 21-I-SMCT, 1987.<br />
Pimental, N.A., Avellini, B.A., and Banderet, L.E. Comparison Of<br />
heat stress when the Navy fire fighter’s ensemble is worn in<br />
various configurations. Technical Report * , Navy Clothing and<br />
Textile Research Facility, Natick, MA, (in progress).<br />
344
Validity of Grade Point Average:<br />
Does the College Make a Difference?<br />
Diane L. iiomnglia, ClC<br />
J'acobina Skinner<br />
Manpower and Personnel IJivision<br />
Air Force Muman Resources Laboratory<br />
Throughout the military and private sector, undergraduate grade point<br />
average (GPA) piays an important role in jo’b selection decisions. 'I'his<br />
measure of academic achievement and demonstrated abiilty is widely field to<br />
predict.employee performance. Recent literature reviews show significant but<br />
modest relationships between GPA and employee performance both in training and<br />
on the job (e.g., Dye & Reck, lY88). .<br />
An issue raised by the use of GPA as a personnel selection factor<br />
concerns the possible lack of equivalence in the grade scale across colleges.<br />
The implication of these inequivalencies for emplojers is that expected<br />
performance would vary among job applicants who have tne same GPA, bc;t wuo<br />
graduated from different colleges. Research on this issue is sparse, but two<br />
studies suggest that a school factor may moderate the GPA-performance<br />
relationship. Dye and Reck (1988) have found correlations for graduat,es of<br />
the same college to be higher on average than those for graduates of ditterer;t<br />
colleges. Further evidence that college characteristics may influence tIie<br />
predictability of GPA has been reported for Air Force officers commissiniie!;<br />
from the Reserve Officer Training Corps (ROTC) program (harrett 6 &mstroug,<br />
_ 1989). Performance prediction was improved by considering a quality mec:sure<br />
for the officers' college (Scott, 1984) in addition to their tipA.<br />
The current study extends the investigation of the college and Cri issue<br />
in the Air Force to a second officer commissioning source: the Officer<br />
Training School (OTS) at Lackland AFB, TX. The findings of a two-phase st;xi><br />
of the relationship between GPAs awarded to cadets graduating from ditferen?<br />
colleges and their subsequent performance in VI'S are reported. Yt1e study<br />
design was previously described by Skinner and Armstrong (lY!Nj. In tile<br />
analytic phase, the initial focus was on the validity or’ Gi?A as a cadet<br />
selector. Both simple GPA effects and the joint effect of college anu C:,A<br />
were investigated. If differential validity for colleges was observeo, tl:e<br />
study design provided for an explanatory phase to identify the characte;‘i+;~;',i.~<br />
of colleges which may be responsible.<br />
Analytic Phase: GPA and College Relationships with Cadet Performance<br />
The analytic phase was conducted to answer two primary questions: i, is<br />
the GPA a valid predictor of OTS performance; and 2) is the relationship<br />
between GPA and performance moderated by college attended?<br />
Procedure<br />
Method<br />
Data were obtained from archival files maintaincc: on Xir ~'OTC.C oeiic:ers.<br />
An initial sample of 11,619 cadets who entered (jTS between l;"?jL .g..ilil i'y?:ii ;,> w,;:. y;<br />
identified. Source data for the primary predictor variables we:re cdc,eti'<br />
4-year undergraduate GPA reported on a 4.0 scale rind rile colicge i*ji;il:::!<br />
345
conferred their baccalaureate degree. hieasures 9i‘ cadet pt2riurbance zec5<br />
obtained Erom various phases of the 12-week W‘s program. Ke2sOfl for<br />
terminating training was used to generate a Pass/Pail dichotomy rekiectirig<br />
final training outcome for the total sample. Light aaditi.oLlaL measures of<br />
performance were available for graduates (ti = Y,Yjo>. k'irlal Course Grade was<br />
an overall rating of academic success in the training course obtained by<br />
s r;:jr<br />
GPA and colleges with cadet performance that were less complex than t;it? .ZIW<br />
hypothesized by the starting model. Possible outcomes were an interaction<br />
between GPA and college, but of a simpler functional form (either linear o'<br />
curvilinear). Alternate models provided for a joint but noninteracting errec:,<br />
due to GPA and college (with either a linear, quadratic, or ciibic form:!. i-1:<br />
these cases expected performance would differ by coilege at fixed *;fLi vaize.+.<br />
but the difference per unit change in GPA would be constant. I';1 e L s .-Cd._ P .c *<br />
complex models specified an effect due solely to GPA (Linear, qu;l(iratill. -:i.:<br />
cubic) or solely to college.<br />
To isolate the "best" model, pairs of models were C 2; :.;:L<br />
most appropriate model for each criterion.<br />
Results of GPA Validation<br />
Simple GPA effects were found for all criteriri except the pa.ss! c2.l i<br />
dichotomy. As shown in Table 1, the bivariate correlations kt;jecn :::-T .i::<br />
performance criteria and GPA indicate low to medium-Low p
elationships. The highest correlation was observed for Final Course bra&e (r<br />
= .31 p q.01) and the lowest correlation for the Pass/Fail dichotomy (r = . ui;<br />
p).CSL Because the study focused on identifying a college effect for a<br />
specific criterion only if a GPA effect was found, the L-assiFai1 measure was<br />
excluded from further analyses.<br />
Joint GYA and college effects were found for all remaining criteria<br />
except the 6th week OTER. Information about both college identity and tiPA<br />
made a unique contribution to prediction of the cadets’ training performance.<br />
However, no interaction between GPA and college was detected. Expected<br />
training performance differed by a constant amount at all GL'A levels.for<br />
graduates of different colleges. The functional form of the CPA-performance<br />
relationship for colleges was linear for three performance criteria and<br />
curvilinear for four performance criteria. Figure 1 illustrates the ,<br />
'representative finding. A curvilinear relationship between GPA and<br />
performance is depicted, and between-college differences in expected<br />
performance are shown to be the same across GPA values.<br />
Table 1.3 Correlations (uncorrected)<br />
of Criteria with GPA<br />
Criteriona r<br />
Pass/Fail .Ol<br />
Final Course Grade .31X"<br />
cwr 1 .19X"<br />
CWT 2 .22x*<br />
CWT 3 .21**<br />
CWT 4 .22**<br />
CWT 5 .18X*<br />
OTER 6th Week .07*<br />
OTER 11th Week .20**<br />
aPass/Fail N = 11,619. Other<br />
criteria N = 9,858.<br />
* p (.05.<br />
**z c.01.<br />
Expected CWT 5 Score<br />
,00 --.-..----- -.-.-<br />
98.<br />
08<br />
94'<br />
.__ _,.____ _. .<br />
92 + _-______ _..__ i<br />
-.<br />
_._.” *<br />
_... -.--.' . . '<br />
SO<br />
88<br />
,---e---.. _- ..*.- -.--. -.-.<br />
. . . .<br />
88 * ---:.,:*<br />
L---.r ---. -- _.._ -_- 7.-..---. _- , _ , ..<br />
1.00 2.00 3.00 4.00<br />
_ _..<br />
Grade Point Average<br />
-- School ‘A’ --+ School ‘8’ -- School ‘C’<br />
Figure 1. Relationship Between Jrii<br />
and CWT 5 Score for Different Collepzs<br />
Explanatory Phase: Characteristics Which Account for College Effects<br />
The explanatory phase was accomplished once the results of the analytic<br />
phase showed that the relationship between GPA and cadet success varied by<br />
college. The objective of this phase was to identify variables reflecting<br />
the characteristics of colleges which might underlie the combined effect of<br />
GPA and college. Of interest was whether performance variance accounted for<br />
by colleges was due primarily to the talent of students (college<br />
selectivity) or to the nature of the academic experience (educational<br />
environment). Astin (1962, 1971) showed that both classes of variables can<br />
be used to distinguish colleges, but suggested (1972) tnat selectivity is<br />
the more important correlate of graduates' future performance.<br />
347<br />
4<br />
.
Subjects<br />
Method<br />
The unit of analysis was colleges. Eleven of the lU2 institutions were<br />
eliminated because data on all of the predictors could not be obtained.<br />
This reduced the number of colleges analyzed to 91.<br />
College Measures<br />
College Selectivity. College selectivity was defined as a measure<br />
which captured the prestige of the university as reflected by the talent of<br />
the students attracted and accepted to the college. To measure college<br />
selectivity, the average scores of the entering freshman classon .<br />
standardized tests (the Scholastic Aptitude Test (SAT) and the American<br />
College Test (ACT)) were recorded. In addition, the selection ratio of the<br />
college (i.e., percentage of applicants accepted) was computeo.<br />
Educational Environment. Educational environment measures reflecteti<br />
academic experiences provided by the university. Measures were percentage<br />
of graduate students, ratio of students to faculty, percentage of full-time<br />
faculty with PhDs, number of volumes in the library, and yearly dollar value<br />
of endowments.<br />
Procedure<br />
The sources of data for the college selectivity and educational<br />
environment predictors were various published documents reporting suct~<br />
statistics (e.g., American Council on Education, 1983, 1987; The College<br />
Blue Book, 1987; Lehman, 1966; National Center for Educational Statistics,<br />
1987). Data used as criteria reflected the unique contribution of the 41<br />
colleges to the prediction of OTS cadet performance. l'hese values were the<br />
regression weights (b-weights) for the college membership binary variables<br />
from the "best" model in the analytic phase.<br />
Analysis<br />
Regression analyses were used to explore the relative contribution ot‘<br />
the two classes of college charactristics in accounting for the college<br />
effect observed for the seven OTS performance measures. Two models, in<br />
which the b-weights for colleges were regressed on both college selectivity<br />
and educational environment measures (Model 1) and on college selectivity<br />
measures alone (Model 2), were analyzed. These models were designed to test<br />
the hypothesis that the variation in expected performance level observed for<br />
graduates of different colleges was due to college selectivity or the taletlt<br />
of the student body, not to the educational environment. The predictor seis<br />
included binary and product vectors for the SAT and ACT variables in order to<br />
account for the schools (N = 51) which reported only one test score, either<br />
SAT or ACT. The predictive accuracy of the two models was compared tising t!ie<br />
P statistic (p 4.01). If the models differed significantly for a criterion,<br />
stepwise regression analyses 'were also accomplished to identify the most<br />
salient indicators among the available educational environment measures. .%<br />
backward elimination method was used to determine which. educational<br />
environment measures improved predictability (p
College Selectivity Versus Educational Environment<br />
As shown in Table 2, the multiple correlations (K) for the college<br />
selectivity and educational environment measures in combination (&ode1 1)<br />
ranged from .45 to .66. The highest relationships (R = .60 or greater) were<br />
obtained for the Final Course Grade, CWT 2, and 11th week OTEK performance<br />
criteria, and the lowest relationships for the CWT 1 and CWT 5 criteria.<br />
The two classes of college characteristic predictors accounted for about 20x<br />
to 40% of the variance (R2) in expected performance due to college<br />
attended.<br />
Table 2. Regression Analysis Results for College Characteristics<br />
College Se1 &<br />
Academic Envir College Se1<br />
OTS Performance Model 1 Model 2 _<br />
Criteria R RL K KL<br />
Final Course Grade 26 4 .42 .62 .3t5 .Y><br />
CWTl .21 .42 .17 7' J<br />
CWT 2 .66 .43 .b3 .40 1:;ti<br />
cwr 3 .51 .25 .46 .21 .c14<br />
CWT 4 .57 .32 .53 .16 .';l;j<br />
cwr 5 .45 .20 .44 .2lJ . .LtJ<br />
OTER 11th Week .6O .jb .47 .22 3 . ly*"'<br />
** p
Personnel. manag~3rs 12sirig wxder.cjrG.~mtf~ CBA ri.s 3 .!ol se'lect 'on Factor<br />
shoul.d be cognizant that the expected futclre performance of employees ma:?<br />
vary as a function !:,f the college attsnfiec!. In aeerci.es . . wl th se? wt ion<br />
systems relying exclcsively on GPA, consideration of the se?.PctiVlty<br />
characteristics of: ?.nc!.!vf!Iual. instituttons 11oV.s pro:nise as the besix f'or 5<br />
methodology to adjust ior the col.le~:e effect. Apnc ies with ee!.ect ion<br />
procedures wh.ich include a measure or each applicant’s coflnitive al?!.!‘tV<br />
(i.e., standardized test score) Jn ar!d!tion to GPA ~.y fir;6 that the<br />
aptitude component cs?tures tf-rc. pcrfornancc vrlriance cllle to co?lep.e?.<br />
Barrett, L.E., & kxstroq, S.D. (1x9, Noventer). %tieratw effects of x?v?].<br />
characteristics on tie predictive val.Idity of college ‘g&e Fint zverv (Cl?). Pqxr<br />
presentecf at the 31st Amxal Q&kxncc of the ?'XItw <strong>Testing</strong>: Assoclat~~, S;F7<br />
Antonio, 7x.<br />
Cman, D.K., R?irJ??tt, L.P., & !&gner, TX. (19%). Air Force r)ff?cer Tvrinirg S&w!1<br />
system (.4Efm-~-@MF5). F@lmk~ AFP,, lY: .v.r FmP? !3lm?!<br />
selectlm L<br />
Flesources J&oratory.<br />
Tr?m, A.!% (kx). (1935). Peterscm's g&k to fax-year colleges (.LW! ti.?. V.WMYF-,,<br />
NJ: Pc?%e?xon's Guides.<br />
350<br />
.
Introduction<br />
Flight Psychological Selection System - FPS 80:<br />
A New Approach to the Selection of Aircrew Personnel<br />
H.D. Hansen<br />
Ministry of Defense, Bonn, Germany<br />
The Selection of Ah-force and Navy flight personnel is a progressive process, commencing. _ .<br />
before the enlistment of the candidates (Phases 1 and 2) and continuing after the normal military<br />
training (which lasts for approximately one year) into Phase 3.<br />
The first Phase is a general screening of such factors as Intelligence and Leadership qualities,<br />
carried out in the respective Officer or NC0 Selection Centres.<br />
The second Phase is a preliminary flight-aptitude screening, using Computer-based psychological<br />
tests, grading candidates as broadly ‘Suitable’ or ‘Unsuitable’.<br />
The third Phase is more precise, making a final decision as to candidate suitability and further<br />
predicting what particular activity each candidate would be best suited for (e.g. Jet, WSO, Prop,<br />
Helicopter or Navigator).<br />
It consists of 3 weeks Navigation/Academic instruction, 1 week FPS 80 Selection and for those<br />
who have survived thus far, 5 weeks Plying instruction on light prop aircraft, including 18 flying<br />
hours.<br />
FPS 80 is the abbreviation for the Flight Psychological Selection System of the Aviation<br />
Psychology Section, Aerospace Medical Institute of the German Airforce.<br />
As the need was identified to improve the effectivity and reliability of the Selection System,<br />
FPS 80 was conceptual&d. It was then designed and a detailed Functional Specification was<br />
prepared, from which the required Hardware and Software was commissioned.<br />
FPS 80 was installed in July of 1987, from which time it was further tested and standardized. It<br />
was introduced as part of the selection process on the 1st April, 1990.<br />
In this paper, we will concern ourselves with a description and statistical evaluation of the FPS<br />
80 Selection system.<br />
An overview of FP!S 80<br />
All those skills which are very difficult or impossible to test in the flying part of the screening,<br />
need to be evaluated, and this is the principal function of FPS 80 - to determine the particular<br />
skills of each individual candidate.<br />
Such particular skills as the multiple tasks required of a WSO, speed of information processing,<br />
estimation abilities in formation flying, spatial orientation and visualization. FPS 80 is much<br />
better capable of categorizing these particular skills, than the later Flying screening.<br />
FPS 80 makes use of a complex simulator-like device, which provides a test-environment very<br />
close to actual flying. The advantage of such a device compared with an aeroplane, consists of<br />
the ability to make an objective measurement of candidate performance in a standardized test<br />
351
situation devoid of external distractions. In this way an adequate performance comparison<br />
between different candidates is provided. In addition a qualitative description of candidate<br />
behaviour may be formulated by observations during the test.<br />
Description of the FPS 80 Test Device<br />
Test position<br />
The two identical test positions are built to resemble cockpits. They contain a seat and the usual<br />
flight controls, viz: stick, rudder, flap-lever, gear-switch, and throttle. These are actual parts<br />
from scrapped military aircraft. In the interest of cost reduction, a stationary cockpit is used.<br />
Impressions of movement originate exclusively from visual inputs.<br />
Conventional flight instruments are depicted on an instrument panel. Three colour VDUs appear<br />
above the panel. These represent the view from the cockpit. The view forwards covers a<br />
landscape of approximately 80 kilometres square. The view includes an airfield and the<br />
surrounding landscape. The scale of the depicted landscape represents the cockpit current<br />
displacement from it; the speed of change of a display represents the speed of the cockpit and<br />
perspective of the objects displayed represents the current orientation of the cockpit. In this way<br />
a realistic impression of motion is convey to the candidate.<br />
In the lower third of the central VDU are displayed the following instruments: power, speedo,<br />
compass, horizon, altimeter, vertical speed indicator and G-meter.<br />
The cockpits additionally have a control and warning-panel that gives information about such<br />
things as landing gear (up/unsafe/down), flaps (up/down), parking brakes, stall warning. An<br />
input key-pad is found on the right side. The performance characteristics mirror those of a<br />
standard single engine machine. System parameters may be changed to simulate other machine<br />
types. The two cockpits operate independently of one another.<br />
System Configuration<br />
The FPS 80 system comprises 7 computers linked by a network. One of these is a central<br />
computer and each cockpit is driven by three more. Tests are controlled from the central<br />
computer console ftom where the test supervisor can start the different test programs, communicate<br />
with the candidates and monitor their progress. He can additionally intercept their visual<br />
displays, and speak with the candidates, singly or severally, by radio link. The candidate<br />
performance data is returned to the central computer where it is stored on tape, later to be<br />
processed on an external computer in combination with the results of the other screening<br />
procedures, to produce a composite performance profile for each candidate. The results from<br />
all candidates may then be statistically analysed.<br />
Test procedure<br />
The PPS 80 Test Procedure consists of 5 missions. Each candidate receives a standard briefing<br />
from the instructor befog beginning each mission. A mission is built up of various distinct<br />
manoeuvres, usually starting and ending with a take-off and landing. Every mission has three<br />
phases, viz:<br />
352
1) Demonstration Phase<br />
The control sequences and instruments required for each mission are first explained and<br />
demonstrated. An ideal mission performance is then displayed on the screens and described in<br />
. pre-recorded standardized form over the acoustic system.<br />
2) Practice Phase<br />
In the second phase, the candidate attempts the manoeuvm himself with assistance both from<br />
. the system (pm-recorded warnings) and from the instructor (optional intervention). The computer<br />
monitors his performance and generates warnings when it strays too far from the optimal .<br />
one. (Tolerances are adjustable.) Should his performance diverge unduly, the manoeuvre is<br />
interrupted and starts anew (up to three times).<br />
3) Test Phase<br />
There is no intervention or assistance during the test phase. The only acoustic inputs, am normal<br />
Controller communications. The same tolerances apply as during the practice phase, and<br />
automatic interruption and restart will occur in the same way.<br />
The candidate’s behaviour is additionally under observation during this phase from a Plight<br />
Psychologist, who subsequently completes an observation log of his performance.<br />
Description of Missions<br />
Mission FPS 01:<br />
- Introduction to the function of the video system, controls and flight instruments.<br />
- Taxiing, Takeoff with Abort, renewed Taxiing to “Number 1 Position”, Take-off and climb<br />
to Pattern-level (1000 ft AGL), Straight and level flight.<br />
- Turns with 20” of bank and 90’ direction change. -Turns with40’ of bank and 180” direction<br />
change. - Turns with 60’ of bank and 360’ direction change.<br />
- Automatic return flight to the airfield with landing.<br />
Mission FPS 02:<br />
- Consists of pattern flying and landings.<br />
Mission FPS.03:<br />
- Take-off and climb to pattern level. Leaving the pattern over the NZP (Navigational Zero<br />
Point) to commence flight proper.<br />
- Navigation flight (1000 ft AGL) with location of targets and the solution of additional tasks<br />
(calculation of course and flight duration per leg). Finally return to airfield and land.<br />
Mission FPS 04:<br />
- Take-off and climb to pattern level. The plane will then be automatically positioned at 6000<br />
ft AGL.<br />
- Recovery from unusual altitudes (nose-up/nose-down). The manoeuvre must be performed<br />
at 5000 ft on a prescribed course and within a given time interval.
- Pursuit of a leading. plane such that a given separation be maintained at all times.<br />
- Homing in on a target, persuit and attack of another plane.<br />
- Finally return to airfield and land.<br />
Mission FPS 05:<br />
An endless tunnel appears on the screen comprising a series of concentric squares and a white<br />
line approaching the viewer through the center of the bottom edges. The squares appear to<br />
approach the viewer by diverging from the centre. The apparent speed of approach of these<br />
squares (which remains constant) simulates the speed of flying through the tunnel. Rotation of<br />
the squares transmits a sensation of banking in the opposite direction. Similarly changes in the<br />
relative displacement of opposite sides (left and right for the tunnel bending, top and bottom for<br />
the tunnel rising and dipping) create effects of the tunnel changing direction and orientation.<br />
These effects communicate themselves to the candidate not as changes in the tunnel however,<br />
but as changes in the attitude of the plane. The alignment of the squares can be restored by the<br />
appropriate control inputs, which in turn restores the impression of level flight.<br />
These effects are accentuated by examination pressure and the feeling of sensory deprivation<br />
caused by a closed cockpit. This is so realistic to some candidates that they experience a sensation<br />
of air-sickness.<br />
Statistical Evaluation<br />
The evaluation of the missions is performed in 3 steps, viz:<br />
1) Data compression<br />
2) Determination ofcorrelations between FPS missions and flight performance in the Screening.<br />
3) Calculation of transformed test results.<br />
Table 1 gives an overview of the number of variables to be processed from each mission.<br />
Table 1: Number of variables per mission<br />
Mission 01: 7 sections of 11 variables.<br />
Mission 02: 11 sections of 11 variables (times 3 circuits).<br />
Mission 03: I8 sections of 11 variables.<br />
Mission 04: 16 sections of 11 variables.<br />
‘\<br />
Mission 05: 9 sections of 9 variables. c-..<br />
‘llis gives a total of 895 processed variables for all 5 missions. This implies ca. 130 kByte raw<br />
data per candidate. Thus a condensation of data is necessary to enable evaluation. (Details of<br />
this condensation procedure are to be found in an exhaustive paper on the subject shortly to be<br />
354
published in the “Wehtpsychologische Untersuchungen”.). The condensation required 60 hours<br />
of processing time, and the results were stored in 7 data banks for later ease of access.<br />
In the second stage of evaluation, correlations were made between the individual variables and<br />
the results from the later Plying screening. In this way the variables best able to predict the<br />
results of the Plying screening were high-lighted.<br />
In the third stage of evaluation, based on a regression analysis of the most predictive variables<br />
from stage 2, a representative value for each candidate was calculated for the individual sections<br />
of each mission, and also for each complete mission (or in the case of mission 02, for each circuit<br />
of the mission).<br />
The validity of the values for each complete mission thus caIculated were correlated with the<br />
results of the Plying screening. All correlations were highly significant, but differed widely<br />
between missions. The fact that the first mission had a lower correlation could perhaps be<br />
explained by the unfamiliarity of the test environment at this early stage of FPS screening.<br />
In a fourth evaluation stage, the fall-out frequencies during Plying screening within groupings<br />
of candidates with similar PPS performances were computed. Table 2 (next page) shows clearly<br />
that candidates with low PPS performances frequently failed the Plying screening.<br />
The second mission appears to have been particularly predictive. Mission 3 and 4 show<br />
irregularities in the middle stages, which could perhaps be explained by the fact that some. of<br />
the skills being tested in these missions, do not play a part in the Flying screening.<br />
Those particularly at risk from the Plying screening are candidates who scored below 51 in the<br />
FPS (most candidates scored in the range 40 to 76). The group of candidates with the best FPS<br />
results ( 69 FPS points) on the other hand, had a 90% success-rate in the Plying screening.<br />
Conclusion<br />
After exhaustive statistical evaluation, it was possible to conclude that the FPS was capable of<br />
predicting success or failure at Plying screening with acceptable accuracy.<br />
A final evaluation of the success of this method of screening (remembering that the full screening<br />
process consists of all five stages in Phases 1 to 3, as at present Plying screening is being retained)<br />
will only be possible after the collection of sufficient statistical evidence of candidates subsequent<br />
performance in training and later operational flying. The same applies, of course, to the<br />
other in-flight disciplines for which PPS screening takes place.<br />
To date, second-rate pilot-candidates have been channclled into positions as Weapon System<br />
Officers and Navigators. It is hoped that the specific results of FPS missions 3 and 4 will show<br />
a better cormIation with subsequent candidate skills in these specialist activities.<br />
355
mission 1<br />
no of cand.<br />
attritions<br />
percentage<br />
mission 2 I)<br />
no of cand.<br />
attritions<br />
percentage<br />
mission 2b<br />
no of cand.<br />
attritions<br />
pcentagc<br />
mission 2c<br />
no of cand.<br />
attritions<br />
percentage<br />
mission 3<br />
no of cand.<br />
attritions<br />
percentage<br />
mission 4<br />
no of cand.<br />
attritions<br />
percentage<br />
mission 5<br />
no of cand.<br />
attritions<br />
percentage<br />
Table 2<br />
FPS 80 test results and attrition rates in Flying screening<br />
< 52<br />
46<br />
23<br />
50%<br />
43<br />
26<br />
60%<br />
44<br />
24<br />
55%<br />
Test results<br />
52-57 58-63 64-69 > 69 total<br />
50 98 120 43 387<br />
11 17 18 1 70<br />
22% 17% 15% 2% 23%<br />
44 71 60 56 274<br />
14 15 5 3 63<br />
32% 21% 8% 5% 2 3 %<br />
33 73 67 57 274<br />
18 8 9 4 63<br />
55% 11% 13% 7% 23%<br />
37 37 63 100 37 274<br />
23 12 17 10 1 63<br />
62% 32% 27% 10% 3% 23%<br />
35 23 31 34 32 155<br />
24 5 2 6 1 38<br />
69% 22% 6% 18% 3% 25%<br />
21 16 24 23 40 124<br />
13 2 3 3 2 23<br />
62% 13% 13% 13% 5% 19%<br />
19 13 25 27 30 114<br />
9 3 4 3 0 19<br />
47% 23% 16% 11% 0% 17%<br />
1) mission 2 consists of 3 identical patterns<br />
_ .
Introduction<br />
Leadership in Aptitude Tests and in Real-Life Situations<br />
A. H. Melter & W. Mentges<br />
Federal Armed Forces Central Personnel Office, K(iln,<br />
Federal Republic of Germany<br />
In the aptitude testing of German volunteers for officer and NC0 careers small groups<br />
of three or four applicants are given planning tasks to work out sequences for action or<br />
to organize items of information. The applicants have to produce an individual draft of<br />
their task solution. They prepare and give the group a short presentation of some aspects<br />
of the planning tasks, and they have to discuss and to decide their tasks and their<br />
individual solutions at a round table.<br />
Officer applicants must organize<br />
- a leisure activity,<br />
- a floor-plan of a supermarket,<br />
-the land utilization and development of a small town,<br />
- a school prize-giving day, or<br />
- a meeting place for young people.<br />
The rating sheet for group tasks is subdivided into four paragraphs:<br />
- The “written plan” section for making notes on the contents, presentation, accuracy,<br />
and lay-out.<br />
- The “short presentation” section for making notes on comprehensibility, behavior, and<br />
argumentation.<br />
- The “round table discussion” section for making notes on social interactions, plans,<br />
decisions,‘and behavior in changing situations.<br />
- The “overall rating of the group task” section for making notes on assertiveness, social<br />
competence and cooperation, argumentation and verbal expression, planning and<br />
decisiveness.<br />
The scale is defined as follows:<br />
1 Very good, obviously positive, clearly more positive characteristics than<br />
usually expected,<br />
2 Good, clearly above average, more positive than negative characteristics;<br />
357
3 Comp!etely satisfactory, somewhat above average, more positive than<br />
negative characteristics;<br />
4 Satisfactory, average, positive and negative characteristics are balanced,<br />
5 Adequate, somewhat below average, more negative than positive<br />
characteristics;<br />
6 Just adequate, clearly below average, more negative than positive<br />
characteristics; _<br />
7 Unsatisfactory, obviously negative, clearly more negative characteristics<br />
than usually expected.<br />
The computer-assisted planning tasks and computer-simulated planning games consist<br />
of a comparable matrix of methods and dimensions, too (Melter & Geilhardt, 1989). As<br />
a rule military raters use the aptitude criteria achievement, social competence and<br />
cooperation, argumentation, planning and decisiveness. These concepts describe a<br />
range of characteristics denotable as leadership in small groups.<br />
The problem of behavior prediction<br />
Now, psychological aptitude researchers and military raters are confronted with the<br />
problem of whether real-life behavior in squads, platoons orcompanies can be predicted<br />
from task-generated behavior in artificial testing conditions.<br />
While the predictor situations are sufficiently described, the criteria referring to careers<br />
and jobs still have to be clarified to solve the prediction problem. Psychological research<br />
normally makes use of analyses of job demands. Such analyses produce the criteria by<br />
which leadership, for example in squads, platoons and companies, can be assessed by<br />
other military personnel (instructors, superiors) and teachers at the officer schools, at<br />
the universities of the German Federal Armed Forces, and in field appointments. If the<br />
predictors and criteria are similar and comparable, the results of such analyses will be<br />
reliable and valid. But if there are great differences between both situations, the<br />
psychological aptitude research unit and the personnel department have to look for the<br />
central personal constructs of the criterion situations. But neither psychologists nor<br />
military users are able to claim td?ave discovered them with hundred per cent reliability.<br />
Use of real-life situations to establish job demands<br />
. One approach to establish career or job demands translatable with psychological<br />
methods in measurements is to issue questionnaires to officers at the officer schools,<br />
the Bundeswehr agencies and in field appointments, In first analyses we used repertory<br />
grid techniques to question 25 military raters and staff officers from the Central<br />
Personnel Office, 17 officers from the Air Force Officer School, and 15 officers from<br />
the Army Officer School about their personal constructs of apt and inapt young officers.<br />
The aim of those studies was to hear the implicit aptitude theories of these officers about<br />
the new officer generation experienced in their own job environment (Mentges, 1989).
A further objective was to produce a diagnostic process model fordetermining and<br />
evaluating the aptitude criteria for selecting officer applicants (Behling & Neubauer,<br />
1990). We intend to question officers in the field, too.<br />
The personal constructs are defined in behavioral terms. However, the method does not<br />
allow to work out unambiguously, in which situation the behaviors defined has what<br />
kind of results, success or failure, for the man concerned. Such distinctions are only<br />
possible if we ask about so-called “behavior - situation - results - triangles” in real-life<br />
environments. This means asking about typical situations, about behavior in such<br />
situations and about the effects of this behavior, for example on the soldiers in the squad<br />
entrusted to the officer candidate for the first time in the training unit. I<br />
When asking about typical situations for leadership we have todifferentiate enormously.<br />
Firstly, the size of the military groups (squad, platoon, company) and the responsibilities<br />
increase during someone’s career.<br />
Secondly, typical situations in peace time, in periods of tension, and in war are different.<br />
Thirdly, we have different typical situations indoors and outdoors. Many further<br />
distinctions are imaginable. It is essential for our problem that while leadership in a<br />
small group will result in success, the same behavior might not be successful in a war<br />
situation. In threatening situations where prompt, precise, and right action is necessary,<br />
there is a need for different leadership qualities from those in situations where there is<br />
no stress (Cardoso de Sousa, 1990).<br />
Predictions for normal and dangerous situations<br />
All military experts concerned with such topics assume that they are unable to reliably<br />
predict leadership in war or to predict the character of that type of officer who would<br />
in fact be able to lead successfully in war simply because the speed, variety, and<br />
unforeseeability of events and behaviors in such crucial circumstances are beyond<br />
precise description and simulation (Oetting, 1988).<br />
On the other hand, there is some evidence that people with a certain pattern of basic<br />
abilities will most probably be unable to hold their own in typical situations. For the<br />
moment, we have left out of consideration the fact that a certain pattern of skills and<br />
knowledge can be generated by training and education.<br />
The psychological and medical assessments of such basic patterns conjoined with the<br />
prediction of success in typical situations are difficult enough, but the educational<br />
assessment of the increase achieved by training and education is incomparably more<br />
complicated.<br />
Let us take an example out of the domain of survival, The analyses of reports given by<br />
survivors of accidents have shown that<br />
- their belief in being rescued,<br />
-the fact that they did not panic,<br />
- their good morale, and<br />
-their will to survive,
each demonstrated in behavior, enhanced their chances of survival (Riider & Minich,<br />
1987).<br />
Of these four psychological characteristics, only morale and will-power can perhaps be<br />
detected in a basic assessment of volunteers. How can we assess whether morale and<br />
will-power of soldiers can be increased to such an extent through training and education<br />
so that they could survive dangerous situations. It is extremely difficult to predict such<br />
an “ultimate” criterion. And because of that we are unable to base a selection and<br />
placement model on aptitude criteria for extreme situations.<br />
It is by no means so that the prediction for “normal” situations are considerably less<br />
difficult than the predictions for dangerous situations. You only have to think of the<br />
quite “normal” prediction of the superior’s ratings at the end of any military training<br />
course, and of the many imponderable factors that can influence the aptitude and<br />
performance rating of an officer candidate.<br />
The environmental factors accompanying military operations, space missions, rescue<br />
operations, or sports activities can - as dangers - drastically affect the behavior of<br />
individuals concerned and have consequences for the life and limb of both those in<br />
charge and their teams, Although predictions are very difficult, psychologists remain<br />
under an obligation for ethical reasons to contribute to predictions by researching into<br />
aptitude criteria and the characteristics of poorer performance and performance enhancement<br />
due to training, in order to improve the selection, the training, and mission<br />
accomplishment with psychological methods.<br />
One example from the domain of sports activities serves as clarification: when dangerous<br />
situations in mountaineering have been analyzed retrospectively from a psychological<br />
view point, it has been noticed that some behavioral characteristics of the men at<br />
risk brought about the potential accidents of guided groups:<br />
- careless and technically deficient safety measures;<br />
- failure to give precise orders, if any at all;<br />
- unrealistic over-estimation of one’s technical skills and fitness;<br />
- euphoria or fatigue combined with decreasing attention;<br />
- arguments and annoyance.<br />
Accidents happen with increasing probability if such behavioral characteristics appear<br />
in the group, and if environmental factors interact in a fateful manner: The guide climbs<br />
a rock passage with crumbling grips and steps; the second member of the group fails to<br />
take adequate securing measures and at the same time chatters to the third member of<br />
the group without observing the guide, who for his part fails to give precise and pressing<br />
instructions to the group to do things right.<br />
The behavioral result of the leader may be a fall, if the environmental factor “loose grip”<br />
comes to bear, a fall which could mean the fall of the whole group because of the<br />
incomplete and inattentive securing, with fatal consequences for all the members. The<br />
guide should be advised to pay attention to the reliability of the members when selecting<br />
. ..~ -I. .- -. _- ._<br />
360
his group, to insist on a short check of their communication and securing skills, and to<br />
attach importance to precise and prompt instructions during the climb.<br />
Results of previous job analyses<br />
Which criteria resulting on the one hand from surveys and on the other from real-life<br />
situations can be provided by aptitude psychologists for a basic assessment in order to<br />
get concepts and measurements of leadership in small groups of aptitude testing?<br />
Surveys with officers from different divisions of the Central Personnel Office (Mentges,<br />
1990) point in a very definite direction that can be paraphrased with<br />
-personal authority and the attending executive techniques,<br />
- assertiveness, taking consideration of the situation and of the people involved,<br />
- cooperation in the sense of commitment to the success of the team,<br />
- comradeship and carei<br />
- courageous and honest acceptance of responsibility.<br />
Anyhow, it does not include the ability to cause conflicts and to test the extent to which<br />
such conflicts can be endured and managed. Ideas of this kind should have been<br />
discharged once and for all from modem group psychology.<br />
References<br />
Behiing, A. & Neubauer, R. (1990). Eignungsmerkmale Offiizierbewerber. (Aptitude criteria for officer<br />
applicants). AbschluBbericht der Industrieanlagen-Betebsgesellsehaft mbH. Ottobrunn.<br />
Cardoso de Sousa, FJ.V. (1990). Leadership under stress: Immediate effects of the aggressive style. Paper<br />
presented at the I.A.M.P.S. conference. Vienna.<br />
Melter, A.H. & Geilhardt, T. (1989). Computer-assisted problem solving as assessment method. Proceedings<br />
of the 31st annual conference of the <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong> (pp. 129-134). San Antonio:<br />
Air Force Human Resources Laboratory.<br />
Mentges, W. (1989). ImpliziteEignungstheorien als Bestandteil der Anforderungsanalyse im Assessment<br />
Center. (Implicit aptitude theories as part of job analyses for assessment centers). Diploma&it im<br />
Fach Psychologie an der Philosophischen Fakultit der Rheinischen Friedrich-Wilhelm.+Universitit<br />
Bonn.<br />
Mentges, W. (1990). Die Erhebung von impliziten Eignungstheorien als Beitrag zur Anfor&rungsanalyse<br />
filr den Offizierberuf. (Survey of implicit aptitude theories as a contribution to job analysis). Koln:<br />
Arbeitsbericht des Personalstammamtes der Bundeswehr.<br />
Oetting, D.W. (1988). Motivation und Gefechtswert - Vom Verhaltcn des Soldaten im Kriege. (Motivation<br />
and combat effectiveness - On the behavior of soldiers in war). Frankfurt und Bonn: Report<br />
Verlag.<br />
R&ler, K.-H. & Minich, I. (1987). Psychologie des Uberlebens - Survival beginnt im Kopf. (The<br />
psychology of survival - Survival begins in the mind.) Stuttgart: Pietsch.<br />
361
Computer-based Assessment of Strategies in Dynamic Decision Making<br />
1. Introduction<br />
Wiebke Putz-Osterloh<br />
University of Bayreuth<br />
In psychological testing, computers are used primarily as economically efficient tools to<br />
administer tests and to analyze and store individual data. This type of testing based on classical<br />
tests is not the subject of this paper, however. Instead, I intend to speak about the uses of<br />
computer programmes to simulate complex situations that call for dynamic decision making<br />
(Kleinmuntz, 1985) or for complex problem solving as Diimer defines it (1978). In the<br />
following, I will first discuss some reasons for complementing classical tests of intelligence<br />
by other methods to extend the range of intellectual demands. Secondly, I will mention three<br />
conditions that should be controlled if one intends to assess individual differences in complex<br />
situations. Then I will summarize empirical results concerning individual differences in<br />
problem solving strategies. Finally I will discuss some difficulties encountered in estimating<br />
the external validity of strategies.<br />
2. Reasons for expanding approaches to intelligence testing<br />
Classical tests of intelligence (whether computer-based or conventional paper-and-pencil)<br />
suffer from some common restrictions with respect to the intellectual demands they cover:<br />
- Test items are static: Items have to be answered independent of the answers given previously<br />
or to be given later.<br />
- Test items are transparent and well defined: Individual differences in knowledge used and<br />
strategies applied must be eliminated to make sure that only one single solution to each item<br />
can be evaluated as the correct one.<br />
- Intelligence is measured by the sum of the correct solutions to items that are to be solved as<br />
quickly as possible: Time consuming processes such as the use of heuristic strategies are not<br />
analyzable.<br />
-Answers to test items have to be selected rather than constructed: Although in real-life<br />
situations the rule is that one has to search for decision alternatives first and to select one of<br />
these afterwards, such search processes are excluded from test-intelligence.<br />
- Applicants often do not accept tests of intelligence as valid or fair predictors for personnel<br />
selection. One approach to overcome the restrictions mentioned is to assess individual<br />
behavior in multiple “real-life situations and exercises” as it is conceptualized by assessment<br />
ten ter methods.<br />
3. Conditions for the assessment of individual differences in decision making strategies<br />
The following conditions are not met or even controlled when using assessment center methods:<br />
- In complex situations there are multiple goals to be reached. Individual differences in decision<br />
making depend on specific defined goals. If individual behavior is to be assessed, the goals<br />
of each subject have to be controlled; otherwise the effectiveness ratings of individual<br />
I....__ _ ,.<br />
362
ehavior will be invalid. This condition is violated in unstandardized group discussions and<br />
in role-taking games.<br />
-In complex situations different strategies are possible, leading to different outcomes. Therefore,<br />
data on strategies should give more information about individual differences than<br />
performance scores alone would; otherwise the analysis of performance alone would suffice.<br />
The use of assessment center data on strategies and performance are interrelated.<br />
- Analyses of strategies are most informative if the data are not predicted by classical tests or<br />
other performance scores. They are useful if they are generalizable to other situations.<br />
4. Empirical studies using simulated dynamic situations*<br />
4.1 The simulated situations and their demands<br />
In our empirical studies two different simulated situations are used. The first system simulates<br />
a small industrial company which produces and sells textiles. The system consists of 24<br />
variables, of which 11 input-variables can be changed directly by decisions made by the<br />
subjects, including the volume of raw materials to be bought, the selling prices, the amount of<br />
advertizing, the number of workers, etc. Subjects are asked to aim at three goals while<br />
controlling the system:<br />
-to make as much profit as possible,<br />
-to increase the company’s capital from beginning to end,<br />
-to pay the workers the highest wages possible.<br />
These three goal variables are used in combination to rate the performance in system control.<br />
The subjects are asked to control the system for 15 simulated months and to decide what<br />
changes should be made in what input variables. As the experimenter operates the computer,<br />
the subjects have to ask questions about the actual state of the variables, and to communicate<br />
their decisions to the experimenter. So, while the subjects are thinking aloud, data can be gained<br />
in quite a natural manner. _<br />
The second system simulates a forest region that is to be protected against fire. The subjects<br />
are asked to take on the role of a fire chief, giving different commands to 12 fire fighting units<br />
(using a mouse). The forest, the units, and the fires are displayed on a graphics terminal in front<br />
of the subjects. The goal is to minimize the area that is burned down (see also Brehmer, 1987).<br />
Again, the same criterion is used to rate performance. The system is to be controlled for 100<br />
time intervals which maximally last one minute each.<br />
Despite differences in the mode of control the two systems have the following demands in<br />
common:<br />
* This research is supported by grants from the Federal Ministry of Defense in Bonn, Federal<br />
Republic of Germany<br />
363<br />
. .
(a) The systems are complex: This means that they contain many variables that are interconnected<br />
by a relational network rather than by single unidirectional relations. Given one input<br />
change, the network between the variables causes not only a main effect but several side effects<br />
that also have to be taken into consideration.<br />
(b) The systems are nontransparent: This means that the relational network connecting the<br />
variables one to another is not shown to the subjects. Therefore the subjects have to generate<br />
hypotheses about the effects of their decisions, which they should then test against the feedback<br />
data.<br />
(c) The systems are dynamic: This means that the variables change their state over time,.even<br />
if there is no input change. As a consequence, the effects of input changes differ depending on<br />
the actual system states.<br />
(d) The systems are meaningful: This means that the variables and their interrelations are<br />
implemented in a system to correspond to a domain of reality. The subjects can use their<br />
domain-related knowledge to generate hypotheses.<br />
4.2 Control strategies and derived measures<br />
Due to the differences between these demands and the demands of test items, it is to be expected,<br />
and is substantiated by empirical data, that performance in system control is not predictable by<br />
intelligence test scores (see Dijrner & Kreuzig, 1983; Darner, 1986; Funke, 1983; Putz-Osterloh,<br />
1981).<br />
As DSrner (1986) argues, strategies in system control are determined by a superordinate type<br />
of intelligence, the so-called “operative intelligence”. This type of intelligence refers to the<br />
construction and adaptive use of, and control over, subordinate processes such as information<br />
gathering, hypothesis testing, planning, and decision making.<br />
Different parameters of individual strategies are combined to evaluate the control over<br />
subordinate processes, e.g. the frequency of correct verbalized hypotheses (correspondent to<br />
system reality) and the rareness of false or irrelevant ones. These parameters are analyzed and<br />
summed up over subsequent time intervals to evaluate control and adaptation over time.<br />
In the following, two examples of complex abilities which can be diagnosed from decisionmaking<br />
and from thinking-aloud data are defined and operationalized.<br />
Ability to organize<br />
High organizing ability is defined by the frequency of prospective decisions to prevent<br />
undesirable system states, the rareness of false decisions, and the coordination of different<br />
decisions to reach more than one goal.<br />
In the economic system, the following parameters are combined: the rareness of isolated<br />
decisions, the frequency of central decisions which directly influence one goal variable, and<br />
the frequency of coordinated (in relation to the goal variables) decision patterns over time.<br />
364
In the fire fighting system prevention is realized by the number of units distributed over the<br />
area before a fire is seen. False decisions mean forgetting to let the units search for and put out<br />
fires by themselves.<br />
Coordination is measured simply by the number of changing commands in the face of new<br />
fires throughout the game.<br />
Ability to decide<br />
A high degree of decision-making ability means the capability to plan in a goal-directed manner<br />
and to realize decisions quickly and precisely.<br />
The following aspects are combined in the economic system: The time to control the system,<br />
the frequency of postulated correct effects of decisions, and the rareness of decisions that do<br />
not work in the system.<br />
In the fire fighting system, the speed and accuracy of decision-making are rated in combination.<br />
This means that the number of new fires that are dealt with in precise commands are summed<br />
up and weighted by the average time lag between the time of the fire and the time that the<br />
corresponding command is given.<br />
4.3 Empirical results<br />
4.3.1 Estimates of reliability<br />
Concerning the economic system retests between different trials of system control do not seem<br />
to be appropriate. Here we can expect content related changes in strategies that may influence<br />
performance without being attributable to a lack of reliability. Empirical results from two<br />
studies show stability of medium-level strategies, either accompanied or not accompanied by<br />
stability in performance (see,Strohschneider, 1986; Funke, 1983).<br />
In the fue fighting system, content-dependent changes in strategies are not to be expected. In<br />
one experimental study (N = 50 university students) two versions of system parameters were<br />
constructed which differed in the number and timing of new fires. The subjects had to control<br />
each version for three trials in one session each. The correlations are lower between the first<br />
set of trials than between the second set. Between the last two trials in the second version, all<br />
correlations are higher than .80, referring to performance as well as to organizing and<br />
decision-making ability. In a second study (N = 80 university students), one system version<br />
had to be controlled for four trials. Performance data between the last two trials are correlated<br />
.84, whereas organizing and decision-making ability is correlated .79 and .76, respectively.<br />
These data on stability are accompanied by significant gains in performance as well as in<br />
strategies from the first to the last trial. This is equally true for both studies.<br />
4.3.2 Data on internal validity<br />
As has been mentioned above, it should be tested whether the subjects are aiming at comparable<br />
goals. If the goals are only vaguely defined, the subjects will probably define different specific<br />
goals for themselves. Consequently, in our studies the subjects are given specific goal variables<br />
. .
“Fzw%mmr-.-.‘~ .<br />
which should be influenced in a specified direction. The objectively defined performance is<br />
correlated with subjectively rated success after system control. In two studies all correlations<br />
are highly significant: For the economic system the correlation is .52 (N = 100) and .48 (N =<br />
48), and for the fire system it is .75 (N = 50) and .79 (N = 80).<br />
An important characteristic of performance in system control refers to its partially ambiguous<br />
meaning as performance level is not equivalent to a specific strategic variant. Following these<br />
arguments, the internal validation of identified strategies should be the proof that these<br />
differences are systematically related to performance level, whereas different strategic measures<br />
shouId not be correlated too highly. In our studies there is clear evidence that differences<br />
in strategies are systematically related to performance. Between studies there are differences<br />
in the amount of common variance. In the economic system, decision-making and organizing<br />
ability is correlated positively with performance, but this is not always significant (decisiveness:<br />
.28 and .20; organizing ability .50 and .13; N = 48, N = 100). For fire fighting the<br />
correlations are higher: decision-making ability with performance .58 (N = 50) and .45 (N =<br />
80); organizing ability .58 (N = 50) and .63 (N = SO).<br />
Further questions aim at the relation between the two strategic measures. In the economic<br />
system, no systematic correlation between decision-making and organizing ability is found (N<br />
= 48, N = loo), whereas in fire fighting there is either no relationship at all or no more than<br />
9% common variance between the two measures (N = 80; N = 48). Finally, the generalizability<br />
of strategies and performance between the two systems was also tested. In two independent<br />
studies, one group of subjects first controlled the fire system and then the economic system,<br />
while the other group worked the systems in the reverse order. A sequence effect is replicated<br />
in the two studies: If the subjects control the fire system first and the economic system<br />
afterwards, differences in performance as well as in organizing ability are correlated systematically<br />
between the two systems (N = 25; N = 30), whereas no systematic correlations are<br />
found if the systems are controlled in the reverse order. Despite some ambiguities in interpreting<br />
the sequence effect, these data give evidence of the generalizability of strategies in system<br />
control.<br />
4.3.3 Dara on exrernal validity<br />
If the systems do represent valid dynamic situations, experts in the simulated domain of reahty<br />
should do’ better in controlling a system than novices do. As is to be expected, in two<br />
independent studies (Putz-Osterloh, 1987; Putz-Osterloh & Lemme, 1987), university professors<br />
(N = 7) and selected postgraduate students in management science (N’= 22) systematically<br />
used more efficient strategies and achieved better performance scores in controlling the<br />
economic system than unselected students (N = 29) did. For the latter subjects, the intelligence<br />
test scores were controlled; they are not correlated with success in system control.<br />
Following the logic of this expert-novice paradigm, in a further study, field-grade officers<br />
(participants in a command and staff course) (N = 27) were compared with unselected students<br />
(N = 30) in controlling the fire system.<br />
Against expectations, no systematic differences between the two groups were found. DO these<br />
negative results falsify a possible external validity of the system to predict success in higherlevel<br />
military careers? There are two arguments that make me inclined to respond to this<br />
question with a negative answer. First, the military subjects are not homogeneous with respect<br />
. -. 366
. .<br />
. -.-- -.-I<br />
to their decision-making behavior. Instead, in some parameters they show greater variance than<br />
students do. Second, some military subjects reported after the tests that they used the commands<br />
in accordance with their specific military education, and that use of such knowledge hinders a<br />
successful control. In contrast to this, other subjects did learn the specific conditions implemented<br />
in the fire system, and they did well. These data shed light on the different demands<br />
of the fm system, depending on the specific knowledge used while controlling it. Further<br />
investigations are needed to specify system demands and the strategies required to deal with<br />
them successfully.<br />
5. Conclusions<br />
(1) There are individual differences in intellectual abilities that are not covered by the usual<br />
intelligence tests. These differences may be of significance for personnel selection.<br />
(2) There are strategic differences in system control that are related to performance; they are<br />
reliable if the subjects are allowed to control a system in repeated trials.<br />
(3) Simulated systems realize compIex demands that are standardized and replicable. Therefore,<br />
systems offer great advantages over standardized group situations.<br />
(4) Besides some evidence of the external validity of strategies and performance in system<br />
control further theoretical and empirical work needs to be done to specify the demands of real<br />
life situations and their correspondences with system demands.<br />
(5) Far from being able to predict precisely what strategies in system control imply for behavior<br />
in real life situations, I consider the reported approach to be worth further pursuit.<br />
References<br />
Brehmer, B. (1987). Development of mental models for decision in technological systems. In J. Rasmussen, K.<br />
Duncan, & J. Leplat (Ed%), New Technology and Human Error (pp. 11 l-142). Chichester: Wiley.<br />
Domer, D. (1986). Diagnostik der operativen Intelligenz. Diagnostica, 32,290-308.<br />
Domer, D. & Kreuzig, H.W. (1983). Problemlosefahigkeit und Intelligenz. Psychologische Rundschuu, 34,<br />
185-192.<br />
Domer, D. & Reither, F. (1978). iiber das Problem&en in sehr komplexen Realititsbereichen. Zeitschrifr fiir<br />
experimentelle und angewandte Psychologie, 25,527-551.<br />
Funke, J. (1983). Einige Bemerkungen zu Problemen der Problemliiseforschung oder: 1st Testintelligenz doch ein<br />
Pradiktor? Diagnostica, 29,283-302.<br />
Kleinmuntz, D.N. (1985). Cognitive heuristics and feedback in a dynamic decision environment. Management<br />
Science, 31,680-702.<br />
Putz-Osterloh, W. (1981). Ijber die Beziehung zwischen Testintelligenz und Problemlbseerfolg. Zeitschrijifiir<br />
Psychologie, 189,79-100.<br />
Putz-Gsterloh, W. (1987). Gibt es Experten fiir komplexe Probleme? Zeitschriftflir Psychologie, 19.5,63-84.<br />
Putz-Gsterloh, W. & Lemme, M. (1987). Knowledge and its intelligent application to problem solving. The<br />
German Journal of Psychology, II, 286-303.<br />
Strohschneider, S. (1986). Zur Stabilitit und Valid&t von Handeln in komplexen RealiBtsbereichen. Spruche &<br />
Kognition, 5,4248.<br />
367
A special Approach in Assessment- based Personnel Selection<br />
G. Rode1<br />
German Naval Volunteer Recruiting Centre<br />
Wilhelmshaven, Federal Republic of Germany<br />
Introduction:<br />
Due to the lower birthrates in the past and the politically based effects in the present, the FRG Artied<br />
Forces have to deal with shrinking numbers of volunteers. The German Navy ’ s efforts to exploit personnel<br />
resources is especially focusing attention on draftees.<br />
To become a temporary career volunteer in the Federal German Navy there are three different ways of<br />
enlistmenl:<br />
The first way involves civilian volunteers applying to the Naval Volunteer Recruiting Centre (NVRC),<br />
where their aptitude for a temporary-career enlistment is tested (selection) prior to their placement in<br />
the Navy, 65% of all temporary-career volunteers enter the Navy using this way via the NVRC.<br />
The second way is the recruitment of conscripts serving in field units. About 10% of the temporarycareer<br />
volunteers are recruited in lhis way.<br />
The third possibility of becoming a temporary-career volunteer in the Navy is through the.so-called<br />
“Information and counseling campaign” (IBA). I would now like to give a more detailed account of this<br />
model of recruitment.<br />
The IBA completes the quarterly temporary-career volunteer requirements which have not been met<br />
by NVRC and at troops level. If the NVRC enlists a great number of volunteers for a specific quarter,<br />
the complementary recruitment requirements to be met by IBA are correspondingly smaller. Thus, the<br />
number of volunteers that have to be recruited by IBA are subject to fluctuations. Usually, the percentage<br />
lies between 25 and 35 of the total need of temporary-career volunteers.<br />
In this context I would like to give you some figures underlying the importance of the IBA for the German<br />
Navy:<br />
The Navy has a strength of approximately 29.000 soldiers (without officers), of whom 3000are in their<br />
basic military service, 16.000 are temporary-career volunteers and 7.800 are regulars.<br />
Every year, about 1,000 soldiers are recruited by IBA as temporary-career volunteers. This amounts to<br />
a quarter of the annual requirements.<br />
The military training system of the Federal German Navy presents great advantages for the realisation<br />
of such a recruitment campaign. Training is provided centrally at only nine training centres - so-called<br />
schools -, and the maximum distance between these centres beeing 500 kilometres. All Navy training<br />
courses are held at these training centres. These training courses permit us a focused approach to all<br />
students for the purpose of recruitment, examination and placement.<br />
Another advantage is the central personnel management in the Navy under the responsibility of the<br />
Navy Enlisted Personnel Office, which keeps us informed about the specific requirements of the Navy<br />
for every quarter. In this way, we can steer the applicants for a placement in specific tasks or jobs.<br />
368
In the Federal Armed Forces, this campaign is unique, and feasible only in the Navy for the reasons I<br />
have explained earlier.<br />
The system of NVRC selecting personnel from the field units for extended military service in the Navy<br />
has already existed for more than 22 years. However, until three years ago, the field units had not been<br />
involved directly in this selection procedure. This task had been performed exclusively by NVRC.<br />
That means that the field units were not sufficiently concerned about recruitment, counseling and<br />
selection of new personnel, leaving this task to other navy institutions such as the Navy Enlisted Personnel<br />
Office, the Naval Office and the NVRC.<br />
This campaign is of particular importance especially now, in a period marked by a drop in personnel<br />
owing to age groups with declining birth rates and to a lack in motivation and insight in the necessity of<br />
armies in the face of the detente in West-East relations. This new procedure should lead to an active<br />
participation of superiors in filed units as multipliers in this process of enlisting, counseling and recruiting.<br />
Recruits are approached for a temporary-career enlistment already in the second month of their<br />
basic training.<br />
As the readiness for volunteering for a temporary-career enlistment is the greatest especially during<br />
the fist four months of basic military service it is absolutely necessary to conduct IBA during this<br />
period.<br />
Therefore, testing takes place in situ at the basic training unit.<br />
Method:<br />
During the first phase of IBA, officers go to the nine basic training garrisons every quarter year in order<br />
to recruit (advertise), volunteers for enlistment in the Navy by means of film, lectures and counseling,<br />
and to inform them about military and vocational possibilities.<br />
During the second uhase, psychologists go to the different garrisons two weeks later in order to examine<br />
the recruits who, during the first phase, have shown an interest in a temporary-career enlistment.<br />
Under the stipulations of the new procedure governing the recruitment of the suitable personnel for<br />
the forces, the task has to be performed jointly by the NVRC and the forces.<br />
The NVRC psychologists have been entrusted with this task for reasons of ensuring the application of<br />
uniform standards to the evaluation of applicants with or without prior service concerning their aptitude<br />
for a temporary-career enlistment and because of the fact that these psychologists have many<br />
years’ experience in personnel selection testing.<br />
The psychologist as well as the superior in the unit are directly and equally involved in the responsibility<br />
for the recruitment of personnel.<br />
By including the forces, this new methodology also takes into account the fact that the validity, i.e. the<br />
quality of a statement on the aptitude of a person increases considerably if the persons concerned is<br />
evaluated separately and independently as compared to those cases when observation, examination<br />
and decision are made jointly and simultaneously.<br />
For an evaluation of the applicant during the psycho-diagnostic interview, the following documents are<br />
available to the psychologist:<br />
369
Medical certificate (exclusions from certain assignments)<br />
Aptitude test results<br />
General application documents<br />
School reports<br />
Testimonials<br />
curriculum vitae<br />
First the new superior will be initiated in the procedure and trained as a rater by the psychologist. Under<br />
this personal responsibility and independently he will observe, judge and evaluate the military conduct<br />
of the applicant’s military qualification for a temporary-career enlistment.<br />
The superior’s aptitude statement must have been completed independently before the psychologist<br />
starts the aptitude test based on following documents:<br />
1.<br />
2.<br />
3.<br />
4.<br />
Application documents.<br />
School reports and reports of professional performance.<br />
Declaration of pending proceedings and financial liabilities.<br />
Section and platoon leaders make contributions to an efficiency assessment based on their observations,<br />
judgements and evaluations of the military conduct in the following military areas<br />
of activity:<br />
In general and specialized instruction<br />
in practical technical training<br />
in hand weapon training<br />
during drills<br />
in physical training<br />
in march training<br />
in field training<br />
Based on the contributions to an efficiency assessment and on his own conclusions from a personal interview<br />
with the applicant, the superior has to evaluate the following aptitude characteristics:<br />
- devotion to duty<br />
- comradeship<br />
- technical abilities<br />
- self assertion<br />
From the documents and the results of the psycho-diagnostic interview, the psychologist evaluates the<br />
aptitude characteristics:<br />
- initiative<br />
- motivation to perform<br />
- articulateness (verbal comprehension and expression)<br />
- judgement<br />
The characteristics “sense of responsibility” and “performance under stress” have to be judged by both<br />
the psychologist and the superior.<br />
Four gradations are the superior’s and the psychologist’s disposal for their recommendations:<br />
After psychologist and the superior have made their evaluations independently, this commission<br />
prepares a joint decision on acceptance or rejection of the applicant for a temporary-career enlistment.<br />
Then it is the psychologist task to determine a suitable placement for the applicant and to discuss it in<br />
detail.<br />
370<br />
.
Evaluation of the counseling nrocedure<br />
For the time beeing, a long-term investigation concerning the validity is not yet available as the new<br />
procedure exits only three years and soldiers have not held their posts in the units long enough to see<br />
whether the counseling procedure has proved its worth.<br />
However, a comparison of the different recruitment procedures of the NVRC and the IBA already<br />
permits a stalement on the quality of the new procedure. In this case, NC0 training course results obtained<br />
by soldiers that have been recruited for the Navy by NVRC and those recruited through IBA<br />
can be compared.<br />
It was to be expected that the results of the training course would not differ significantly as the<br />
psychologists involved are the same in both cases, and they can make use of their many years’ experience<br />
of test methodology.<br />
However the results also confirm the application of uniform standards to both procedures.<br />
A further confirmation of the new IBA procedure is the opinion poll about the personal involvement<br />
and the acceptance of the new procedure of the IBA has the following results:<br />
Out of 84 superiors only 3 officers had a negative or indifferent opinion about the new way IBA is<br />
practised now.<br />
The absolute figures of the results of the last years to be compared makes no sense because of the<br />
decreasing number of applicant for volunteering in the Navy. But if you look at the relative frequencies<br />
the new procedure succeeded. The percentage of the enlistment rate was in 1986 about 77.8 %. This<br />
rate increased 1989 up to 90.4 %. The difference is statistically not significant. It means however to the<br />
Navy that the absolute number of enlistments is nearly constant in the last years, though the total numbers<br />
of applicants is reduced.<br />
Conclusions:<br />
I cannot judge whether this procedure designed to recruit suitable personnel for the Navy can also be<br />
transferred to other Navies. According to the Naval Staff, this procedure has proved its worth in the<br />
German Navy. The forces and their superiors themselves feel that they are more actively involved in<br />
the process of recruiting and therefore they intensify their counseling efforts for individual applicants<br />
thus acting as multipliers.<br />
371<br />
.
_.-.- - . . -.-_--_ . .- .~<br />
--...T<br />
TROUBLESHOOTING ASSESSMENT AND ENHANCEMENT (TAE) PROGRAM:<br />
TEST AND EVALUATION RESULTS *<br />
Paper Presented by Dr. Harry B. Conner,<br />
Navy Personnel Research and Development Center, San Diego, CA 921528800<br />
32nd ANNUAL MILITARY TEST& ASSOCIATION CONFERENCE<br />
November f&9,1990, Orange Beach, Alabama<br />
Nauta Rack- (1984) reported on a number of difficulties associated with the U.S. Navy’s ability to<br />
maintain fts weapons systems. He reported the costs of poor performance of maintenance personnel and<br />
recommended areas re uiring investigation if performance of these personnel was to improve. At about the<br />
same time at the Navy 8 ersonnel Research and Development Center (NPRDC), we determined that one of<br />
the difficulties we had encountered in the test and evaluation of an ongoing project (the Enlisted Personnel<br />
Individualized Career System-EPICS) was we had no way of comparing maintenance personnel in the most<br />
important aspect of their erformance: troubleshooting of the hardware system. We realized that we needed<br />
an objective way to eva P uate personnel performance in the skill of troubleshooting. A, literaturessearch<br />
supported the contention that most research and development efforts in this area start with a premtse of a<br />
known expert, journeyman/master, or experienced troubleshooter when in fact these are defined rather than<br />
empirically determined. Therefore, we concluded that efforts to improve maintenance personnel<br />
troubleshooting performance were futile until we could empirically and objectively define how a good<br />
troubleshooter performs.<br />
Aoproach. We addressed this evaluation issue first with a feasibility study (Conner 1988, 1987) followed<br />
by a more structured investigation the Troubleshooting Assessment and Enhancement (TAE) program. The<br />
TAE objective was to design, develop, test, and evaluate a low cost troubleshooting evaluation capability.<br />
The model (Figure 1) we used in our investigation shows that maintenance is just one of a number*of<br />
activities associated with a hardware system. Within the area of maintenance, one can perform preventattve<br />
or corrective maintenance. Within corrective maintenance, one troubleshoots or repairs. Specifically, we<br />
focused on the skill of troubleshooting, which we considered to be a skill of problem solving requiring abstract<br />
conceptualization capabilities.<br />
HARDWARE SYSTEM INTERACTfONS<br />
I I I<br />
CONSTRUCT INSTALL OPERATE MAINTAIN<br />
I 1<br />
PREVENTIVE-- . CORRECTIVE<br />
. .._____.... ^_<br />
MAINTENANCE MAlNTtNANUt<br />
I<br />
I I<br />
+ TROUBLiSHOOTlNG REPAIR<br />
Figure 1. Hardware Activity to Troubleshooting<br />
With 25 subject matter experts, we developed a list of factors to be used to evaluate the proficiency of a<br />
troubleshooting technician in a high tech environment; that is, systems having state-of-the-art electronics and<br />
computers requiring troubleshooting. Next, we sent our initial factors list with definitions (shown in Table 1)<br />
to 1200 operational hi-tech personnel for ranking. The results were then weighted by a jury of experts (on the<br />
system under investigation). Once the factors were weighted, a scoring methodology was developed. Table<br />
2 provides the results of the factor development, weighting, and TAE scoring scheme. Our literature search<br />
caused us to add a tenth factor: redundant checks.<br />
::<br />
3.<br />
4.<br />
2<br />
i:<br />
9.<br />
10.<br />
Rank Factor<br />
Solution.<br />
Cost (Incorrect Solutions).<br />
Time.<br />
Proof Points.<br />
Illogical Approaches.<br />
Invalid Checks.<br />
Out-of-Bounds.<br />
Test Points.<br />
Checks.<br />
Redundant Checks.<br />
TABLE 1. Factor Definitions<br />
-<br />
Definition<br />
Problem is correctly solved; fault is identified.<br />
Number of Lowest Replaceable Units (LRUs) incorrectly identified as faulty.<br />
Total minutes from login to logout taken to find the fault.<br />
Test points that positively identify LRUs as faulty.<br />
Inappropriate equipment selection.<br />
Inappropriate test at appropriate test point.<br />
inappropriate test point was selected.<br />
Total number of valid reference designator tests.<br />
Total number of tests performed at all test points.<br />
Same test performed at same point during the episode. - - - -<br />
� The opinions expressed in this paper are (hose of the author, are not official and da not necessarily reflect Ihe vim+s Of the Navy mPadme”t<br />
372<br />
.<br />
/
TABLE 2. Ranking, Weighting, and Scoring for Troubleshooting Evaluation Factors<br />
Rank Factor Weight Scoring Scale<br />
(Max Points)<br />
Scoring<br />
(Per event)<br />
1<br />
2<br />
3<br />
4<br />
Solution<br />
Cost (Incorrect Sol)<br />
Time<br />
Proof Points<br />
42.78<br />
13.13<br />
11.80<br />
9.88<br />
‘EZ<br />
20:62<br />
17.23<br />
-100 For Fail to find<br />
-0.5 X ea NFR LRU<br />
-0.5 X ea Minute<br />
- % X ea Proof Pt missed<br />
i<br />
Illogical<br />
Invalid<br />
Approach<br />
Checks<br />
6.87<br />
4.68<br />
12.01<br />
8.18<br />
-6.0 X ea Illogical App<br />
-0.8 X ea Invalid Check<br />
5<br />
Out of Sounds<br />
Tests Points<br />
4.00<br />
3.21<br />
8.99<br />
5.61<br />
-0.6 X ea Out of Sounds<br />
4.5 X # of Tests<br />
Q<br />
10<br />
Checks<br />
Redundant Checks<br />
3.08<br />
tbd<br />
5.38<br />
tbd<br />
-0.5 X # of Checks<br />
to be analyzed<br />
Scoring is designed to discriminate between levels of troubleshooting proficiency: failure to solve the<br />
roblem results in a score of 0, while solving the problem results in a score of 100. There is no partial score<br />
Por factor 1. Ability to discriminate between levels of troubleshooting proficiency is in scoring of the remaining<br />
factors. Wei hts for the factors were converted into a scale equaling 100 points. The final score for each<br />
subject equa Bs<br />
100 points minus the sum of points lost for each factor. The minimum score is 0: that is, no<br />
ne atfve scores. The scoring criteria for each factor, also shown in Table 2, are the wei hts that were used<br />
In tfe TAE epfsodes to evaluate and diagnose troubleshooting proficiency levels. The cost9actor<br />
was changed<br />
to incorrect solutions to more accurately describe the actual behavior.<br />
Once we had determined factors and scoring scheme, we selected and constructed practical<br />
troubleshooting episodes that provided a valid representation tionof the hardware system being used in the<br />
study. Our hardware system was the U.S. Navy s communications system, the Navy Modular Automated<br />
Communications System/Satellite Communications (NAVMACS/SATCOM). To construct TAE<br />
troubleshooting episodes, we focused on the fault diagnosis/problem solving behaviors (Table 3) that military<br />
schools have identified in their six step troubleshooting process (Conner 1986,1987).<br />
\<br />
TABLE 3. Six Step Troubleshooting Process<br />
1. Symptom Recognition<br />
2. Symptom Elaboration<br />
3. Probable Faulty Functions<br />
4. Localizing Faulty Function(s)<br />
5. Isolating Faulty Circuit<br />
6. Failure Analysis.<br />
Although the design and delivery of the troubleshooting episodes did not require a computer, the amount<br />
of data made it obvious that the only efficient and cost effective approach would be utilization of microcomputer<br />
delivery and data gathering. Also, to keep developmental and hardware costs down, we limited ourselves to<br />
using off-the-shelf technology. We also reduced the “troubleshooting universe” of the episodes so that a<br />
standard microcomputer memory could handle data.<br />
The model developed for the troubleshooting actlviiy on a iven piece of hardware (shown in Fi ure 2)<br />
9<br />
provides a TAE Factors Model for “System Troubleshooting.” he modd works as follows: Once a8ystem is determined to be inoperative, the fault symptoms reduce the universe of type and location of tests to be<br />
made to a reasonable spectrum for further Investigation; that is the symptoms bound the problem and<br />
establishes what is in or out of bounds. This bounding of the problem reduces the number of tests in the<br />
spectrum to reasonable number and limits the amount of computer memory necessary. We called the “in<br />
bounds” checks that are not logical for the fault symptoms “illogrcal approach.” For a given set of symptoms<br />
for a given fault, there is an optimum troubleshooting path to determine the problem. To prove a component,<br />
or unit, is bad a number of tests must be performed; this requires testing of the “proof points.”<br />
. .<br />
IroubleshootlW<br />
-!+DrK~<br />
Jrcub @shoctlna<br />
f Faull 0 llbgi~l Approach<br />
0 Optimum Path 0 Out Of Bounds<br />
0 Proof Points 0 In Bounds<br />
Figure 2. TAE Factors Model<br />
373
The goal in the TAE testing is to find and replace the LRU. Subjects begin TAE testing by reviewing a series<br />
of menus of symptoms, Panels, and diagnostic information; next they select equipment to be tested and<br />
conduct tests or replace a LRU.<br />
ch Hy@heseg, The 20 hypotheses for the TAE Test and Evaluation were organized into seven<br />
categories: experience, electronics knowledge, electronics performance proficiency, difficulty level, time,<br />
complex test equipment, and ranking. The hypotheses in each category, and method of testing each, are<br />
described in the following sections.<br />
METHOD<br />
Test Administration Procedu & <strong>Testing</strong> was conducted by NPRDC personnel in a classroom at the<br />
Advanced Electronics School: Department (AESD), Service Schools Command, San Diego, California.<br />
<strong>Testing</strong> was on the Zenith 248 microcomputer. Technical documentation for the hardware system was.in the<br />
classroom. Subjects were assigned randomized test sequences to protect from test order effects. Srxteen<br />
episodes were administered to each subject and each episode required about an hour to comPlete, but<br />
subjects had no specific time limit. Subjects completed all episodes in two to three days. The admrnrstrator<br />
was present in the classroom during testing. Subjects listened to an introduction to the TAE study and the<br />
technical documentation available; read and signed a Privacy Act release statement; and completed a<br />
computerized Learn Program, 2 practice and 14 test troubleshooting episodes. After testing, subjects<br />
received test performance feedback and completed a critique.<br />
Subjects. Subjects for the TAE test and evaluation were students in the “system” phase of the maintenance<br />
course and the system qualified instructors. All subjects were required to have school training on the<br />
subsystems.<br />
Data Data were collected for 53 students and 13 instructors in two data bases, using a standard<br />
statistical Package for analysis. The first contained demographic data: the second, performance data. Data<br />
were collected for seven classes of students between April and September 1989. Demographic data for each<br />
student included: SSN, time in service, Armed Services Vocational Aptitude Battery (ASVAB) scores, school<br />
subsystem scores, school corn rehensive score, school final score, class ranking, TAE ranking, and instructor<br />
ranking. Demographic and TAI! performance data for instructors were collected during September 1989. The<br />
demographic data for instructors included SSN, rate/rating, time in service & paygrade, time system qualified,<br />
time working on the system in the fleet and as a system instructor. The TAE program data for both students<br />
and instructors consisted of scores for 16 episodes encompassing 673 variables. Table A-l describes the<br />
variables for each episode (Episode 1 is presented).<br />
Data files were refined and evaluated. Data for five students were dropped due to missing data, and for<br />
two instructors due to lack of system qualification. Thus, the data of 59 subjects were used for this study, 48<br />
students and 11 instructors. The resultant data base were used to create files for testing the study hypotheses.<br />
The master file was used to create files with variables specifically required to test each hypothesis. The<br />
methods for testing the hypotheses are described in the following subsections.<br />
RESULTS and DISCUSSION<br />
Results of the data analyses are presented in Appendix A, and the specific areas investigated are discussed<br />
in the following:<br />
Demoaraohic Data. For the 48 students, the average time in service was 2.23 years. For the 11 instructors,<br />
9 had a rate of electronics technician first class (ETl) and 2, of ET2; the average paygrade was 5.82. The<br />
average time in service for instructors was 10.41 years and average time in paygrade was 3.64 years.<br />
Instructors were system qualified for an average of 4.67 years and had worked on the system hardware in the<br />
fleet an average of 2.94 years, In addition, they averaged 16.18 months as instructors.<br />
.<br />
EI@WIWZ (Table A-2). Hypothesis 1. Instructors (experts) will score significantly higher on the TAE test<br />
than students (novices). A one-way analysis of variance (ANOVA) was performed to test hypothesis 1. The<br />
F ratio value is not significant.<br />
Hypothesis 2. Sub’ects with a longer time in the electronics rate (i.e., Time in Service - TIS) will score<br />
significantly higher on tlle TAE test than subjects with less time in that rate.<br />
Generally, the relationship between experrence and TAE performance was not statistically significant. This<br />
apparent anomaly may be explained by the fact that instructors of the course are not required to be system<br />
qualified. Students must prove their system qualification to graduate.<br />
The lack of a significant relationship between experience and troubleshooting performance causes one to<br />
uestion if the experience measures were appropriate, if an appropriate set of subjects was tested, if the TAE<br />
1 elivery and evaluation systems are valid, or if there is actually no difference due experience. Given the face<br />
validity of TAE and the high level of expectation by subject matter experts of the relationship between<br />
experience and Performance, further testing is needed to resolve this issue.
.<br />
EJectronlcs KnowledQg (Table A-3). Hypothesis 3. Students with higher academic school final scores<br />
will score hi her on the TAE test than students with lower scores. The correlation between academic school<br />
final scores aover<br />
course final score) and TAE test scores is significant at the .05 level. However, the correlation<br />
between academic school comprehensive scores (final test) and TAE test scores is positiie but not<br />
significant. Therefore, academic school final scores were significantly correlated with TAE test scores, but<br />
school comprehensive scores were not.<br />
Hypothesis 4. Students wfth higher academic school subsystem test scores will score higher on the TAE<br />
subsystem tests (episodes) than students with lower school subsystem test scores. For Subsystem 1, the<br />
correlation of academic school subsystem test scores with TAE subsystem test scores is significant at the .05<br />
level. Subsystem 2 has a positive correlation, which is not significant. Both Subsystems 3 and 4 have negative<br />
correlations, which are not significant. Therefore, the only significant correlation between academic school<br />
subsystem test scores and TAE subsystem test scores was for Subsystem 1 (the computer).<br />
Hypothesis 5. Students with higher appropriate Armed Services Vocational Aptitude Battery (ASVAB)<br />
scores for Electronics Technician selection in general science, electronics information, mathematics<br />
knowledge, arithmeticreasonin (jGS + El + MK] +AR), and the armed forces qualification test (AFQT), will<br />
score higher on the TAE test ta an subjects with lower ASVAB and selection scores. All but one of the<br />
correlations is negative. The only significant correlation between ASVAB scores and TAE score is Arithmetic<br />
Reasoning (AR) with a negative correlation significant at the .05 level. The only positive correlation is between<br />
General Science (GS) and TAE score, which was not significant.<br />
There was no generally consistent relationship between electronics knowledge and TAE performance.<br />
There was a relationship where performance testing was a component of the academic score used. There<br />
was, however, a negative relationship between the scores used to determine selection to the occupational<br />
speciality (electronic technician) and performance scores.<br />
The lack of relationships of electronic theory or academics and troubleshooting performance need further<br />
investigation. As with a number of other studies of this type, there was no consistent relationship between<br />
knowledge of theory and the ability to perform. This may have been related to the method of determining<br />
knowledge and academic success in the school. <strong>Testing</strong> in the school does not appear to rovide<br />
discriminatory capability and correlational analyses do not show statistically significant results. tchools<br />
should ensure tests discriminate between student’s academic and performance ability and assess student<br />
behaviors in a more structured, formalized, objective way. Otherwise, effects of a change to instructional<br />
methods or techniques cannot be assessed in terms of course outcomes. FurtherTAE testing might determine<br />
the resulting relationships.<br />
Also, the relationships of selection requirements and troubleshooting performance need further<br />
investigation. Of greatest interest is the failure of performance results to positively relate to the ASVAB scores<br />
used to select personnel for this occupational speciality. The consistent negative trend seems to indicate<br />
that, while the ASVAB tests may relate to academic performance, there may be no relationship between<br />
ASVABs, TAE performance, and/or on-the-job performance.<br />
.<br />
etfwce Profa<br />
. .<br />
7 Fable A-4). Hypothesis 6. Subjects with a higher level of<br />
troubleshooting proficiency will make ewer invalid checks than less proficient subjects. The correlation<br />
between TAE score and the number of invalid checks is not significant.<br />
Hypothesis 7. Subjects with a higher level of troubleshooting proficiency will make fewer illogical<br />
approaches than less proficient subjects. The correlation between TAE score and the number of illogical<br />
approaches is significant at the .Ol level.<br />
Hypothesis 8. Subjects with a higher level of troubleshooting proficiency will make fewer incorrect<br />
solutions than less proficient subjects. The correlation between the TAE score and the number of incorrect<br />
solutions is significant at the ,001 level.<br />
Hypothesis 9. Sub’ects with a higher level of troubleshooting proficiency will make fewer redundant checks<br />
than less proficient sub jects. The correlation between TAE score and the number of redundant checks is not<br />
significant.<br />
Hypothesis 10. Subjects with a hi her level of troubleshooting roficiency will test significantly more proof<br />
ooints than less oroficient subjects. 7 he correlation between the f-AE score and the number of proof points<br />
js significant at the .OOl level. .<br />
Hypothesis 11. Subjects with a higher level of troubleshooting proficiency will make significantly fewer<br />
tests than less proficient subjects. The correlation between the level of troubleshooting proficiency and<br />
number of tests is significant at the .OOl level.<br />
The only proficiency factors that failed to show significance were invalid and redundant checks, which<br />
could have been caused by design of the delivery system and/or the method of determining these factors.<br />
This set of hypotheses strongly support the validity of the TAE technique and approach.<br />
The utility of the TAE as a job performance measure and as an objective measure of readiness in the skill<br />
area addressed (in this case, system troubleshooting) should be investigated further.<br />
. .<br />
DIfflcultv (Table A-45 . Hypothesis 12. The more difficult the episodes, the longer the average time<br />
needed to find the solution. ihe<br />
correlation of TAE difficulty with length of time to find the solution is significant<br />
at the .OOl level.<br />
Hypothesis 13. On episcdes of equal difficulty, subjects with a higher level of troubleshooting proficiency<br />
will take significantly less time than less proficient subjects in finding the solution. Episode difficult levels<br />
were determined and episodes were grouped with level 1 being the easiest and level 5 the most di fricult as<br />
375<br />
_ .
follows: (I 2 episodes (2) 4 episodes (3) 3 episodes (4) 2 episodes and (5) 3 episodes. Hypothesis 13 was<br />
significant 1y<br />
supported for each level.<br />
Hypothesis 14. The more difficult the episode, the less time the instructors will take to find the TAE test<br />
solutions when compared to the students (novices). The difficulty level of the episode and the dtfference rn<br />
time between instructors and students to find TAE test solutions is negatively correlated but not significant.<br />
Although no significant difference was found, the more difficult the episode, the less time instructors tended<br />
to take to find the TAE test solutions when compared to the students.<br />
Generally, the results were as expected; that is, the more difficult, the more time; at different levels of<br />
difficulty, better performers took less time. An unexpected result was the lack of significant difference between<br />
students and instructors. The difference was, however, strongly in the direction expected.<br />
The consistently significant relationship In this area clearly calls for further investigation and improvement,<br />
particularly in behavioral and cognitive task analyses.<br />
Iime (Table A6). Hypothesis 15. Subjects with a higher level of troubleshooting proficiency will take<br />
si nificantly less total time to find TAE e isode solutions than less proficient subjects. The correlation b&ween<br />
TfE score and total time to find epis OCPe fault is significant at the .OOl level.<br />
Hypothesis 16. Subjects with higher levels of troubleshooting proficiency will take a significantly longer<br />
time than less proficient subjects before making the first test point. The correlation between TAE score and<br />
time to first test point is significant at the .05 level.<br />
Results suggest that analysis of behavior and cognitive protocols could result in a dramatic change In the<br />
way the training community presents troubleshooting training. Here again, behavioral protocol analysis could<br />
provide useful information on training approaches.<br />
Comolex Test Eauioment (Table A-7). Hypothesis 17. Subjects with a higher level of troubleshooting<br />
proficiency will make significantly more tests using an oscilloscope than less proficient subjects. The<br />
correlation between TAE score and the number of oscilloscope tests is not significant.<br />
Given the nature of the hardware system and the resulting TAE delivery system, subjects did not a pear<br />
to have sufficient opportunity to use complex test equipment in the TAE episodes. Therefore, the lac R of a<br />
statistically significant result may have no practical meaning.<br />
m (Table A-8). H pothesis 18. The higher the student’s TAE class rank, the higher the student will<br />
be ranked in terms of trou‘6 leshooting proficiency by instructors or work center supervisors. Hypothesis 18<br />
was supported for two classes at the .OOl level. The correlation between TAE class ranking and in.structorMork<br />
center supervisor ranking was not significant for the other classes. Although not significant, two classes had<br />
an inverse relationshi .<br />
Hypothesis 19. Ph e higher the student’s TAE class rank (final score), the higher will be the student’s<br />
ranking in the class. Hypothesis 19 was supported for one class at the .Ol level of significance. For the other<br />
classes, the correlation between TAE class ranking and ranking in school class was not significant. Although<br />
not significant, two classes indicated a strong positive correlation. Conversely, one class showed a strong<br />
inverse relationship between TAE class ranking and school class ranking.<br />
Hypothesis 20. The higher the instructor ranking of the student in terms of troubleshooting proficiency,<br />
the higher will be the student’s ranking in the class (final score). Hypothesis 20 was supported for three<br />
classes, one class at .OOl level and two at .05 level. Although not significant, one class showed a strong<br />
positive correlation between instructor student ranking and class student ranking. One class showed a weaker<br />
positive correlation and two classes indicated an inverse relationship.<br />
There were no consistent results in rankings across instructors, TAE performance, or school Performance.<br />
In several classes, inverse relationships were shown. Only one class had a consistent significant relationship<br />
across hypotheses.<br />
The results of this area most clearfy attest to the need for an objective evaluation tool of the skill of<br />
troubleshooting. It shows that supervisors, and school results do not have the ability to evaluate personnel<br />
in this skill.<br />
FUTURE EFFORTS<br />
In addition to the recommendations made for each area of investigation, we also have the following general<br />
recommendations for future efforts in this area.<br />
1. Further investigate TAE validity and reliability. Design and development of the TAE approach and<br />
delivery system stron ly support face validity of TAE. Subject matter ex erts were involved in all phases of<br />
the project. They Betermined the factors of evaluation, weights oP the factors, evaluation scheme,<br />
troubleshooting episodes to be used, developed the episodes and participated in the test and evaluation.<br />
Since T&E results are somewhat ambiguous, areas dealing with validity and reliability should be investigated<br />
further.<br />
2. Analyze data to further develop discriminatory/predictive capability. Results of performance Of<br />
subjects on TAE episodes should be subjected to behavioral protocol analyses to develop a model of<br />
troubleshooting and further analyses of approaches used by good vs. bad troubleshooters and ultimately<br />
cognitive Protocol analyses to determine selection, training and evaluation requirements.<br />
376
3. Further test the TAE approach on a larger and more comprehensive popL!la:ion and on other<br />
equipment. Further investigation should use hardware that allows wider and less restnctfve utilization of test<br />
equipment. It may also be possible to select specific troubleshooting episodes that enable wider utilization<br />
of more types test equipment. This type of investigation should take place to determine if certain episodes<br />
and hardware types require special test equipment use capability. Investigate this approach in other high-tech<br />
hardware systems as well as other occupational areas (i.e., mechanical hardware troubleshooters/repair<br />
personnel). A TAE type delivery system should be developed for a number of other high and mid-tech<br />
hardware systems. -- - -<br />
4. Develop more troubleshooting episodes to provide directive training, guided training, and tests with<br />
feedback. Then, a complete and comprehensive troubleshooting skill development, maintenance,<br />
assessment, and evaluation program would be available for personnel from novIce to expert skill levels. TAE<br />
could be used for active duty personnel in a school or fleet environment and for reserve personnel at the<br />
readiness centers or aboard ship during active duty periods.<br />
For greater detail on the background, design/development and administration and the test and evaluatlon<br />
results consult: Conner and Hassebrock (in press); Conner, Hartley and Mark, (in press); and Conner,.Poirier,<br />
Ulrich and Bridges, (in press).<br />
REFERENCES<br />
Conner, H. B. (1988, October). o oub Tr Proficiencv I‘ eshoot in F v a l u a t i o n Proiect (TPE P). In Proceedings of<br />
<strong>Military</strong> <strong>Testing</strong> <strong>Association</strong> Conference, Mystic, Connecticut.<br />
ner, & Bjl(1987, Apri<br />
ace ss e System f<br />
of the National Security In<br />
on Pro&t TTPFP) form<br />
ationai Manpower and Training Conference<br />
Conner, H. B. & Hassebrock, 7 7 I(rn press). -Assessmentand Fnhancement (TAF)<br />
Proaram: Theoretical. Method0 oa ca , Test and Evaluation Issues. San Diego: Navy Personnel Research<br />
and Development Center.<br />
~)rr~mHTB.; H;rtley, S.,.Mark, L J: (In press). rent FAF)<br />
oo a : es a d Evaluatron. San Diego: Navy Personnel Research and Development Center.<br />
Conner, H. B., Poirier! C., Ullrich, R., & Bridges, T. (In press).. [<br />
Des~gp, Devmnd Proom San Diego: Navy Personnel Research<br />
nt Center. ’<br />
. . . .<br />
Nauta, F. (1984). Afl&z&g Fleet Maintenance-Q! Maintenance-<br />
Research. (VI NAVIRAEQUIPCEN MDA903-81-0188-l). Orlando: Naval Training Equipment Center.<br />
377<br />
- .
EKie<br />
;;<br />
;:<br />
;:<br />
v7<br />
V8<br />
VQ<br />
VlO<br />
Vll<br />
v12<br />
v13<br />
v14<br />
v15<br />
V16<br />
v17<br />
V18<br />
v19<br />
V2CJ<br />
v21<br />
Contents of Variable<br />
Subject’s Social Security Number<br />
Equipment (hardware subsystem) number<br />
(1 = USH26)<br />
Episode number (1)<br />
Found Solution (1 = Yes, 0 = No)<br />
Number of Test Points<br />
Number of Out-of-Bounds tests<br />
Number of Valid Checks<br />
Number of Invalid Checks<br />
Number of Redundant Checks<br />
Number of Proof Points subject tested<br />
Total number of Proof Points in the episode<br />
% proof pts tested: o/l0 % Vl l)*lOO, rounded<br />
to a whole number<br />
Total Time spent on the episode (in minutes)<br />
To Be Determined<br />
Number of Equipment Selection events<br />
Number of Front Panel events<br />
Number of Maintenance Panel events<br />
Number of Fallback test events<br />
Number of Reference Designator test events<br />
Number of Replace LRU events<br />
Number of Review Symptoms events<br />
APPENDIX A<br />
TAE DATA and ANALYSIS RESULTS<br />
TABLE A-l. Variables for TAE Episode 1<br />
TABLE A-2. Experience<br />
Variable<br />
Name<br />
Contents of Variable<br />
To be determined<br />
E Number of Diagnostic Test events<br />
V24 Number of Load Operational Program events<br />
v25 Number of Step Procedure events<br />
V26 Number of Revision events (instructor intervention)<br />
V27 Number of INCORRECT Replace LRU events<br />
V28 Number of GOOD FAULT Replace LRU events<br />
Time to first Reference Designator Test (in minutes)<br />
E Time to first Diagnostic Test (in minutes)<br />
v31 Sum of all steps of episode: ALL events, except Inst.<br />
actions.<br />
V32 Number of Waveform tests performed<br />
V33 Number of Voltage tests performed<br />
V34 Number of Read Meter tests performed<br />
V35 Number of Logic tests performed<br />
V36 Number of Current tests performed<br />
v37 Number of Frequency tests performed<br />
V38 Number of Continuity tests performed<br />
v39 Number of Adjustment tests performed<br />
V40 Final Score of the episode<br />
v41 To be determined - for possible future expansion<br />
V42 To be determined -for possible future expansion<br />
v43 To be determined -for possible future expansion<br />
HI TAE Student TAE Test Score & instructor TAE Test Score<br />
Group Mean N Variable l:TAESCORE<br />
1 70.396 48 Source Sum of Sqs D.F. Mean Sq FRatio Prob.<br />
G&d 70.980 73.422 59 11 Within Between 2057.124 81.073 57 1 81.973 36.090 2.271 .1373<br />
Mean Total 2139.098 58<br />
Correlational Hypothesis Statement N Correlation Critical Value<br />
H3 TAE Score vs TIS 59 .13676 .21638<br />
Correlational Hypothesis Statement<br />
H3 TAE vs School Final<br />
TAE vs School Camp<br />
TABLE A-3. Electronic Knowledge<br />
H4 Avg. TAE Subsystem 48<br />
1 vs. School Subsys 1 .27704’<br />
2vs.” “2 .17579<br />
3v.s” “3 - .18146<br />
4vs.” “4 - .21972<br />
H5 TAE vs ASVABS 48<br />
AFQT - .00398<br />
AR - .32510’<br />
El - 96673<br />
ASVABl - .02672<br />
ASVABT - .13055<br />
ii<br />
N<br />
Correlation Critical Value<br />
.30181* .24045<br />
.17311 .244X5<br />
TABLE A-4. Electronic Performance Proficiency.<br />
.24045<br />
.24045<br />
.24045<br />
a24045<br />
.24045<br />
.24045<br />
.24045<br />
.24045<br />
.24045<br />
Correlational Hypothesis Statement N<br />
Correlation Critical Value -<br />
H6 TAE vs Invalid Checks 59<br />
- .17107 .21638<br />
H7 TAE vs Illogical Approaches<br />
59 - 34057” .21638<br />
H8 TAE vs Incorrect Solutions<br />
59 - .69676”* .21638<br />
H9 TAE vs Redundant Checks<br />
59 - 98543 .21638<br />
HlO TAE vs Proof Points<br />
59 .56997*** .21638<br />
Hl 1 TAE vs X of Tests<br />
59 - .55201*** .21638 - - -<br />
378<br />
-<br />
_<br />
,
* PC.05<br />
* * pc.01.<br />
*** pc.001.<br />
Correlational Hypothesis Statement<br />
H13 Ep Diff vs. Ep Time<br />
H14 Ep Diff Lev vs Time<br />
b/81 1 (Easiest)<br />
Level 2<br />
Level 3<br />
Level 4<br />
Level 5 (Hardest)<br />
H15 Ep Diff vs. Time Dif<br />
Correlational Hypothesis Statement<br />
HlSTAEvsTime<br />
H18 TAE vs Time to 1st Check<br />
TABLE A-5. Difficulty Level<br />
N<br />
ii<br />
14<br />
TABLE A-6. Time<br />
Correlation critical Value<br />
.93D51 � ** .45Qtxl<br />
- .81265*** .21638<br />
- 3x04** .21638<br />
- *74653*** .21638<br />
- .73553-e .21638<br />
- 587Q8*** .21638<br />
- 34658 .459rxJ<br />
Correlation Critical Value<br />
- .49233*‘* .21638<br />
59 - .23814* -21638<br />
TABLE A-7. Complex Test Equipment<br />
Correlational Hypothesis Statement N Correlation Critical Value<br />
H17 TAE vs 0SCOp8 use 59 .18T71 .21638<br />
Correlational Hypothesis Statement<br />
H71;z yking vs lnst Ranking<br />
2<br />
3<br />
4<br />
5<br />
6<br />
7<br />
H21 TAE Rank vs Class Rank<br />
Class 1<br />
2<br />
3<br />
4<br />
f<br />
7<br />
yl;iss yk vs lnst Ranking<br />
f<br />
4<br />
f<br />
7<br />
TABLE A-a. Ranking<br />
379<br />
N<br />
7<br />
7<br />
,B<br />
8<br />
8<br />
7<br />
7<br />
7<br />
i<br />
8 6<br />
7<br />
7<br />
7 8<br />
ii<br />
9<br />
7<br />
Correlation<br />
.96429***<br />
35714<br />
A6429<br />
.4S571<br />
- .14286<br />
- .07143<br />
.96429*‘*<br />
.8Q286**<br />
.57143<br />
- .14288<br />
46571<br />
- .37143 sQ524<br />
.60714<br />
.96429***<br />
- 35714 .02381<br />
.75ax*<br />
:Z*<br />
64286<br />
Critical Value<br />
.87649<br />
67649<br />
.87649<br />
.73972<br />
.73972<br />
.82658<br />
47849<br />
m649<br />
67549<br />
.67649<br />
.73Q72<br />
.82658 .73972<br />
.87649<br />
.67649<br />
67649 .82658<br />
67649<br />
a2558<br />
.686Q7<br />
.67649<br />
. .
Incrementing ASVAB Validity with<br />
Spatial and Perceptual-Psychomotor Tests<br />
Henry H. Busciglio<br />
U. S. Army Research Institute<br />
The Army's Project A is a long-term, comprehensive effort to<br />
improve the selection and classification of enlisted personnel.<br />
One objective of this effort was to develop and validate measures<br />
of abilities other than the general cognitive domain covered by<br />
the Armed Services Vocational Aptitude Battery (ASVAB), including<br />
spatial, perceptual, and psychomotor abilities. Previous<br />
analyses of Project A data (Campbell, 1988) showed that the ASVAH .'<br />
is useful for predicting first tour performance. Therefore, the<br />
ASVAB serves as a baseline against which the marginal utility of<br />
other tests for selection and classification is judged. This<br />
analysis of data collected during the 1985 Project A Concurrent<br />
Validation attempted to answer three questions:<br />
(1) How much of the variance in comprehensive performance<br />
measures can spatial and perceptual-psychomotor tests account<br />
for, over and above that predicted by ASVAB subtests?<br />
(2) Is either type of test, spatial or perceptual-psychomotor,<br />
more useful for incrementing ASVAB validity?<br />
(3) Which specific Project A tests will make the highest<br />
individual contributions to this incremental validity?<br />
Subjects<br />
Method<br />
Subjects were first-term enlisted personnel in the nine MOS<br />
for which hands-on criterion measures were collected as part of<br />
the 1985 Concurrent Validation phase of Project A. The number of<br />
subjects from each MOS, as well as the total sample size, is<br />
shown in Table 1.<br />
Predictors<br />
Predictors were the nine ASVAB subtests, the six Project A<br />
paper-and-pencil tests of spatial ability, and 14 selected scores<br />
from the ten Project A computerized perceptual-psychomotor tests.<br />
Table 2 presents a list of these predictors, along with the<br />
specific perceptual-psychomotor scores used.<br />
Presented at the meeting of the <strong>Military</strong> <strong>Testing</strong><br />
<strong>Association</strong>, November, 1990. All statements expressed in this<br />
paper are those of the author and do not necessarily reflect the<br />
Official opinions or policies of the U.S. Army Research Institute<br />
or the Department of the Army.<br />
380
Table 1<br />
Subjects<br />
MOS Enlisted Job N<br />
11B<br />
13B<br />
19E<br />
31c<br />
63B<br />
64C (now 88M)<br />
71L<br />
91A<br />
95B<br />
TOTAL<br />
Infantry<br />
Cannon Crew<br />
Armor Crew<br />
Single Channel Radio Operator<br />
Light Wheel Vehicle Mechanic<br />
Motor Transport Operator<br />
Administrative Specialist<br />
Medical Specialist<br />
<strong>Military</strong> Police<br />
491<br />
464<br />
394<br />
289<br />
478<br />
507<br />
427<br />
392<br />
597<br />
4,039<br />
Note. Actual sample sizes for some analyses were<br />
smaller than those shown.<br />
Table 2<br />
Predictor Measures<br />
ASVAB Subtests:<br />
Spatial Ability Tests:<br />
Mechanical Comprehension Assembling Objects<br />
Auto/Shop Information Map<br />
Electronics Information Maze<br />
Math Knowledge<br />
Object Rotation<br />
Arithmetic Reasoning Orientation<br />
Verbal (Paragraph Comprehension Figural Reasoning<br />
+ Word Knowledge)<br />
General Science<br />
Coding Speed<br />
Number Operations<br />
Perceptual-Psychomotor Tests and Scores:<br />
Target Tracking 1 - accuracy<br />
Target Tracking 2 - accuracy<br />
Target Shoot - accuracy and time-to-fire<br />
Cannon Shoot - time discrepancy (from optimal)<br />
Simple Reaction Time - decision time<br />
Choice Reaction Time - decision time<br />
Short-Term Memory - decision time and proportion correct<br />
Perceptual Speed - decision time and proportion correct<br />
and Accuracy<br />
Target Identification - decision time and proportion .correct<br />
Number Memory - response time<br />
381
criterion Measures<br />
All criteria were comprehensive, llcan-dolt measures of job<br />
performance, as listed and described below.<br />
Total Score on Written Tests: measures of soldiers'<br />
technical knowledge pertinent to the various "critical tasks"<br />
performed in each MOS.<br />
Total Score on Hands-On Tests: measures of soldiers' ability<br />
to actually carry out the 14 to 17 major job tasks in each MOS.<br />
General Soldierina Proficiencv: a composite score on written<br />
and hands-on tests of tasks common to many MOS (e.g., determining<br />
grid coordinates on maps, recognizing friendly/threat aircraft).<br />
Core (i.e., MOS-specific) Technical Proficiency: a composite<br />
score on written and hands-on tests of tasks that are at the<br />
llcorell of each MOS (i.e., those that define the MOS).<br />
Skill Qualification Test Score (SOT): written tests Of MOSspecific<br />
technical knowledge developed by the U.S. Army Training<br />
and Doctrine Command for periodic testing of soldiers MOS.<br />
The comprehensive measures above are not mutually exclusive.<br />
Written and hands-on test scores were used in the computation of<br />
General Soldiering and Core Technical Proficiency, as well as the<br />
total scores for written and hands-on tests. ,<br />
Procedure<br />
Collection of Project A predictor and criterion data was<br />
part of the 1985 concurrent validation. Scores on the ASVAB<br />
subtests and the Skill Qualification Test were obtained from<br />
archival data sources.<br />
A series of backward stepwise multiple regression analyses<br />
were performed separately for each MOS. An SPSS Regression<br />
program sequentially entered blocks of ASVAB, spatial, and<br />
perceptual-psychomotor tests, removing nonsignificant tests in<br />
each block before entering the next block. Two orders of entry<br />
were used. In both cases the ASVAB tests were entered first; in<br />
one analysis spatial tests were entered second, followed by the<br />
perceptual-psychomotor; in the other analysis this order was<br />
reversed. Results were corrected for restriction-of-range in the<br />
ASVAB scores (Lawley formula; Lord and Novick, 1968), and<br />
adjusted for shrinkage (Wherry, 1940).<br />
Results<br />
Table 3 shows the proportion of variance explained (R2) by<br />
the significant predictors of the criteria at each stage of<br />
382
Table 3<br />
Proportion of Criterion Variance (R2) Accounted for by<br />
Significant Predictors (Median Values Across MOS)<br />
Stage (1) (W Pa) (2b) (3b)<br />
Predictors Retained ASV ASV+Sp ASV ASV+P/M<br />
Predictors Entered ASV SP P/M P/M SP<br />
Written Tests: .59 .64 65 .61 .65<br />
Hands-On Tests: .29 .33 133 .31 33<br />
General Soldiering: .47 .51 53 .50 :53 .<br />
Core Technical: .44 .49 :50 .48 51<br />
Skill Qualification: .53 .54 .55 -54 :55<br />
analysis. A comparison of column 1 with columns 3a and 3b<br />
indicates that spatial and perceptual-psychomotor test scores<br />
substantially improved the prediction of Written Test scores,<br />
General Soldiering Proficiency, and Core Technical Proficiency.<br />
Increases in R*s for Hands-On Tests and the Skill Qualification<br />
Test Score were more modest.<br />
Regarding the relative usefulness of spatial vs. perceptualpsychomotor<br />
tests for incrementing the prediction of the<br />
criteria, columns 2a and 2b of Table 3 show median incremental<br />
R2s (across MOS) of spatial vs perceptual-psychomotor predictors<br />
at Stage 2. Spatial tests were slightly better than perceptualpsychomotor<br />
scores for improving the prediction of the criteria.<br />
The third research question concerns the validities and<br />
incremental validities of individual Project A tests. Table 4<br />
lists the three best spatial, perceptual, and psychomotor tests,<br />
in terms of frequency and magnitude of significant effects. For<br />
the tests of spatial ability, Assembling Objects, Figural<br />
Reasoning, and Map were superior incremental predictors. Among<br />
the perceptual scores, Target Identification (% correct), Short<br />
Term Memory (% correct), and Number Memory (response time) were<br />
especially useful as incremental predictors. For the psychomotor<br />
scores, l- and a-Hand Tracking (accuracy), and Target Shoot<br />
(time-to-fire) were the best.<br />
Discussion<br />
In these analyses Project A test scores substantially<br />
improved the prediction of the criteria. The results for Total<br />
Score .on Written Tests and General Soldiering Proficiency support<br />
the wide generalizability of Project A incremental validity.<br />
Specifically, the first measure may involve highly different<br />
content across MOS, while the second measures a set of more<br />
383
Table 4<br />
Best Spatial, Perceptual, and Psychomotor Tests<br />
Number of Equations Range of Median<br />
Project A Where Significant Semi-partial<br />
Tests (Maximum=86) Correlations<br />
Spatial:<br />
Assembling Objects 48 . 06 - .11<br />
Figural Reasoning 40 . 07 - .12<br />
Map 33 .07 -, .14<br />
Perceptual:<br />
Target Id. - % correct 32<br />
Short Term Memory - % correct 25<br />
Number Memory - response time 25<br />
. 07 - .lO<br />
.07 - .14<br />
ns - -07<br />
Psychomotor:<br />
a-Hand Tracking - accuracy 20 05 - .15<br />
l-Hand Tracking - accuracy 18 -:OS - .lO<br />
Target Shoot - time-to-fire 6 -.09 - -.07<br />
common tasks, but does so using both written and hands-on scores.<br />
Although spatial tests were slightly superior to the<br />
perceptual-psychomotor scores as incremental predictors, the<br />
latter group of measures accounted for criterion variance which<br />
is not redundant with the spatial tests. This is important<br />
because the perceptual-psychomotor tests require expensive<br />
computer hardware and software and must be administered<br />
individually. Thus, their utility should be considered<br />
separately with each selection or classification decision.<br />
These analyses also revealed that some individual Project A<br />
tests were significant incremental predictors across a wide<br />
variety of MOS and criteria (see Table 4). These measures are<br />
therefore strong candidates for addition to ASVAB.<br />
To interpret these results properly, a number of<br />
methodological considerations should be noted. First of all,<br />
ASVAB scores were employed for selection, while the Project A<br />
scores were used Infor research purposes only." Individuals may<br />
have responded more carefully, exerted more effort, etc., on the<br />
ASVAB subtests, thus making them more valid measures of abilities<br />
than the Project A tests. Another concern is a statistical one.<br />
Although the samples used were large enough to make the degree of<br />
shrinkage in each individual equation relatively low, the large<br />
number Of equations computed increases the probabilities that<br />
384<br />
. .
some ASVAB and Project A predictors were significant due to Type<br />
I errors. Although most of the Project A tests were significant<br />
far more often than the chance level (cf. the middle column of<br />
Table 4), the lack of opportunities at this point for crossvalidation<br />
renders the results reported in this paper exploratory<br />
and suggestive only.<br />
The Longitudinal Validation of Project A, which began in<br />
1986/87, will provide more definitive answers to the research<br />
questions involved in these analyses. Based upon the preliminary<br />
results reported here, we are optimistic about the findings of<br />
the Longitudinal Validation.<br />
References<br />
Campbell, C.H. (in preparation). Developing basic criterion<br />
scores for hands-on tests, iob knowledae tests, and task<br />
ratina scales (Draft of AR1 Technical Report).<br />
Campbell, J.P. (Ed.). (1988). Imnrovins the selection,<br />
classification, and utilization of Armv enlisted personnel:<br />
Annual report, 1986 fiscal year (AR1 Technical Report 792).<br />
Alexandria, VA: U.S. Army Research Institute.<br />
Cohen, J., & Cohen, P. (1983). Anolied multinle rearession/<br />
correlation analysis for the behavioral sciences. Hillsdale,<br />
NJ: Lawrence Erlbaum Associates.<br />
Davis, R.H., Davis, G.A., Joyner, J.N., & de Vera, M.V. (1987).<br />
Development and field test of iob relevant knowledge tests<br />
for selected MOS (AR1 Technical Report 757). Alexandria, VA:<br />
U.S. Army Research Institute.<br />
Lord, P., & Novick, M. (1968). Statistical theorv of mental<br />
test scores. Reading, MA: Addison-Wesley Publishing Co.<br />
Pedhazur, E.J. (1982). Multinle regression in behavioral<br />
research (2nd. Ed.). New York, NY: Holt, Rinehart and<br />
Winston.<br />
Peterson, N.G. (Ed.). (1987). Development and field test of the<br />
trial battery for Proiect A (AR1 Technical Report 739).<br />
Alexandria, VA: U.S. Army Research Institute.<br />
Wherry, R.J. (1940). Appendix A. In W.H. Stead and C.P. Shartle<br />
(Eds. 1, Occunational counseling techniques. New York:<br />
American Book Company.<br />
385<br />
. .
Item Content Validity: Its Relationship<br />
With Item Discrimination and Difficulty<br />
Teresa M. Rushano<br />
USAF Occupational Measurement Squadron<br />
At the USAF Occupational Measurement Squadron (USAFOMS), subject-matter experts<br />
(SMEs) rate the questions on promotion tests for content validity.<br />
They also use standard statistical criteria to determine whether test questions<br />
should be reused on subsequent test revisions. The purpose of this<br />
research was to explore the relationship between SME content validity ratings<br />
(CVRs) and item statistics.<br />
The Specialty Knowledge Tests (SKTs) used for enlisted promotions in the Air<br />
Force are written at USAFOMS by senior NCOs acting as SMEs under the guidance .<br />
of USAFOMS psychologists. Within each specialty, one SKT is prepared for<br />
promotion to staff sergeant (E-51, and one for promotion to technical and<br />
master sergeant (E-6 and E-7).<br />
The USAFOMS test development process includes a procedure based on the methodology<br />
of Lawshe (1975) for quantifying content validity on the basis of<br />
essentiality to job performance. As part of the process of revising an existing<br />
SKT, each SME independently assigns each test question a rating using<br />
the following scale:<br />
Is the skill (or knowledge) measured by this test question:<br />
(21, Essential<br />
Useful but not essential (11, or<br />
N o t n e c e s s a r y (01,<br />
for successful performance on the job?<br />
The SMEs as a team then use these ratings as a point of departure in discussing<br />
whether individual items should be retained on subsequent test revisions.<br />
Perry, Williams, and Stanley (1990) found that CVRs influence SME determina-<br />
tion of an item's test-worthiness and its subsequent selection for continued<br />
use or deactivation. However, the ratings are not the only factors which may<br />
impact the SME decision whether to reuse an item on an SKT. After completing<br />
the CVRs, SMEs review item statistics.<br />
For each SKT question, item statistics are provided which indicate how well<br />
,-an item is doing on the test. USAFOMS has an established set of statistical<br />
Tcriteria for test items which must be met. Test questions that do not meet<br />
these criteria must be revised in order to be incorporated on the revised<br />
3 version of the test. The two statistical elements examined in this research<br />
~.'are the difficulty index and discrimination index. The difficulty (DIFF) of<br />
# a test item, sometimes known as its ease index, is defined as the total percentage<br />
of examinees on a test who selected each choice. The DIFF value for<br />
the correct answer is examined to see if the item as a whole is too easy or<br />
too hard. For example, an item answered correctly by 97% of the examinees is<br />
considered too easy for the purposes of the SKT and would not be reused on<br />
subsequent test revisions.<br />
The s,econd statistical element used in this research is the discrimination<br />
index (DISC). This statistic is calculated for each item choice by subtract-<br />
ing the percentage of low-scoring examinees (i.e., those scoring in the lower<br />
50% of all examinees) who select a choice, from the percentage of high-scoring<br />
examinees making that choice. If a test question is working properly,
the higher-scoring examinees will answer the question correctly, while the<br />
lower-scoring examinees will select incorrect options. When this occurs, the<br />
correct answer’s DISC will be positive and the incorrect answers will have<br />
negative DISC values.<br />
METHOD<br />
Content validity ratings and item statistics were obtained from both the E-5<br />
and E-617 SKTs of 23 Air Force specialties (AFSs). Table 1 lists the AFSs<br />
examined and their Air Force specialty codes (AFSCs). Using USAFOMS standard<br />
forms, SMEs rated the content validity of each item on the tests they were<br />
revising. The AFSs chosen for this study were those found by Perry et‘ al.<br />
(1990) to have significant (p
Table 1<br />
Air Force Specialties and Specialty Codes<br />
SPECIALTY AFSC<br />
Pararescue/Recovery 115x0<br />
Visual Information Production 231X3<br />
Airfield Management 271X1<br />
Air Trafic Control 272X0<br />
Elec. Comp. and Switching Systems 305x4<br />
Maint. Data Systems Analysis 391x0<br />
Missile Systems Maintenance 411XOA<br />
F-15 Avionics Test Station 451x4<br />
FB- 111 Avionics Test Station 451X6<br />
Photo. and Sen. Maint. Tac/Recon 455XOA<br />
Photo. and Sen. Maint. ReconEl.0. 455XOB -<br />
Air Launched Missile Sys. Maint. 466X0<br />
Zomm. Computer System 491x0<br />
Refrigeration and Air Conditioning 545x0<br />
Construction Equipment 551x1<br />
Production Control 555x0<br />
Logistics Plans 661X0<br />
[nformation Management 702X0<br />
tianpower Management 733x1<br />
Radiology 903x0<br />
Medical Laboratory 924x0<br />
systems Repair 991x4<br />
scientific Measurement 991x5<br />
. ._ -. , _ .<br />
388
Table 2<br />
Correlation Coefficients of Content Validity Ratings and Item Statistics<br />
15156A *.282 .148 90370 .021 *.230<br />
S156B *.265 .182 92450 *.336 *.263<br />
15176 .009 .148 92470 *.315 *.259<br />
15550A *.240 .177 99154 .064 .055<br />
15570A *.278 ,076 99174 .105 .050<br />
i5550B *.196 .116 99155 .017 .083<br />
i5570B .175 *.210 99175 .004 .130<br />
*Indicates significant correlation t.05)<br />
389
an item is for a certain level of test and the two SKTs are constructed inde-<br />
pendently. Typically, both the specialty training standard (STS) and the<br />
occupational survey report (OSR) which are used in the development of SKTs,<br />
show that different levels of knowledge are required for these ranks and that<br />
different types of tasks are associated with E-5 and E-6/7 positions.<br />
Finally, a fourth post hoc analysis was conducted to examine the test populations<br />
of the 48 SKTs studied. It was hypothesized that SKTs with higher test<br />
populations would more likely be the SKTs with significant correlation be-<br />
tween CV-Avg and item statistics since statistics from higher populations are<br />
more reliable. .<br />
The first three post hoc analyses were conducted using chi-square tests of<br />
statistical significance. Even though eight of the 19 career fields with<br />
significant correlation between CV-Avg and DIFF were from the electronic area,<br />
no significant difference was found (p
-~ . -______ -__--..-<br />
REFERENCES<br />
Lawshe, C. H. (1975). A quantitative approach to content validity. Person-<br />
nel Psychology, 28, 563-575.<br />
Implementation of<br />
Perry, C. M., Williams, J, E., and Stanley, P. P. (1990).<br />
content validity ratings in Air Force promotion test construction. Proceed-<br />
ings o.f the 3Znd Annual Conference of the <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong>, 1990.<br />
391
The Air Force Medical Evaluation Test, Basic<br />
<strong>Military</strong> Training, and Character of Separation<br />
Edna R. Fiedler’<br />
Wilford Hall Medical Center<br />
Lackland Air Force Base, Texas<br />
Selection procedures and rapid early intervention are two strategies used<br />
by the United States Air Force to reduce the human and monetary costs of<br />
attrition in the enlisted force. Cognitive measures such as the Armed Services<br />
Vocational Aptitude Battery and the Armed Forces Qualification Test have long<br />
been used effectively for academically based screening. Se1 f reported<br />
biographical data (biodatal and personality measures have been used for<br />
screening noncognitive adaptability.<br />
Armed Services biodata techniques have included the Navy’s Recruit<br />
Background Questionnaire (RBQ) and the Army’s <strong>Military</strong> Applicant Profile (MAP)<br />
and Assessment of Background and Life Experiences (ABLE!. Currently, the Navy,<br />
as Executive Agent, designed the Armed Service Applicant Profile, a combination<br />
of the best items from MAP and RBQ (Trent, Quenette, 6 Laabs, 1990; Laabs,<br />
Trent, 81 Quenette, 19891.<br />
Other studies have used a variety of personality measures to predict basic<br />
military training attrition, While Spielberger and Barker (1979) studied the<br />
relationships of trait and state naxiety on attrition from basic military<br />
training for both Navy and Air Force recruits, Butters, Retzlaff and Gibertini<br />
(19861 used the Millon Clinical Multiaxial Inventory (MCMI) to predict 80% of<br />
mental health clinic recommended discharge versus return-to-duty dispositions.<br />
McCraw and Bearden 11988) have focused on motivational demographic, i;nd<br />
personality test scores to technical training school students referred to a<br />
mental health clinic.<br />
Since the 1970’s, the Air Force has used The Air Force Medical Evaluation<br />
Test (AFMET I to screen out those basic recruits likely to attrite from Basic<br />
<strong>Military</strong> Training, Early work on the development and initial validation of the<br />
instrument included the studies by Lachar (19741, and Guinn, Johnson, and Kenton<br />
(1975). Bloom (1977, 1980, 1983) reported on the ongoing operational aspects of<br />
the program. The interested reader is referred to Crawford ‘8 (19901 review of<br />
the history of AFMET. This study reports on the efficacy of the instrument used<br />
in the first phase, the History Opinion Inventory (HOI), for predicting BMT<br />
performance and Character of Separation. In addition, the Gordon Personal<br />
Profile (Gordon) and the Minnesota Multiphasic Personality (MMPI) are discussed<br />
in relationship to BMT performance and character of separation.<br />
METHOD<br />
Subjects.<br />
The total sample consisted of all USAF enlisted personnel whose total<br />
Active <strong>Military</strong> Service Date was calendar year 1985 through 1989 and who were<br />
also identified by Wilford Hall USAF Medical Center for testing on the AFMET,<br />
or 171,707 subjects (males = 138,601, females = 33,106). The number of<br />
1 Disclaimer: The views expressed in this paper are those of the authcr and do<br />
not neCeSSaPily represent those of the United States Air Force or the Department<br />
of Defense. Acknowledgments: The author thanks Melody Darby and Doris Black for<br />
their assietance in statistical analyses, Calvin Fresne for his assistance ln<br />
data management, and Malcolm Ree, PH.D. for his assistance throughout the study.<br />
a____.--- .-..__ --_<br />
392
REFERENCES<br />
Lawshe, C. H. (1975). A quantitative approach to content validity. Personnel<br />
Psychology, 28, 563-575.<br />
Perry, C. M., Williams, J. E,, and Stanley, P. P. (1990). Implementation of<br />
content val.idity ratings in Air Force promotion test construction. Proceed -<br />
ings of the 32nd Annual Conference of the <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong>, 1990.<br />
391<br />
.
The Air Force Medical Evaluat ion Test, Basic<br />
<strong>Military</strong> Training, and Character of Separation<br />
Edna R. Fiedler’<br />
Wilford Hall Medical Center<br />
Lackland Air Force Base, Texas<br />
Selection procedures and rapid early intervention are two strategies used<br />
by the United States Air Force to reduce the human and monetary costs of<br />
attrition in the enlisted force. Cognitive measures such as the Armed Services<br />
Vocational Aptitude Battery and the Armed Forces Qualification Test have long<br />
been used effectively for academically based screening. Self reported<br />
biographical data (biodatal and personality measures have been used for<br />
screening noncognitive adaptability.<br />
Armed Services biodata techniques have included the Navy’s Recruit<br />
Background Questionnaire (RBQl and the Army’s <strong>Military</strong> Applicant Profile (MAP)<br />
and Assessment of Background and Life Experiences (ABLE!. Currently, the Navy,<br />
as Executive Agent, designed the Armed Service Applicant Profile, a combination<br />
of the best items from MAP and RBQ (Trent, Quenette, 6 Laabs, 1990; Laabs,<br />
Trent, 81 Quenette, 1989).<br />
Other studies have used a variety of personality measures to predict basic<br />
military training attrition. While Spielberger and Barker (1979) studied the<br />
relationships of trait and state naxiety on attrition from basic military<br />
training for both Navy and Air Force recruits, Butters, Retzlaff and Gibertini<br />
(lp861 used the Millon Clinical Multiaxial Inventory (MCMI) to predict 80% of<br />
mental health clinic recommended discharge versus return-to-duty dispositions.<br />
McCraw and Bearden (19881 have focused on motivational demographic, and<br />
personality test scores to technical training school students referred to a<br />
mental health clinic.<br />
Since the 1970’s, the Air Force has used The Air Force Medical Evaluation<br />
Test (AFMET 1 to screen out those basic recruits likely to attrite from Basic<br />
<strong>Military</strong> Training. Early work on the development and initial validation of the<br />
instrument included the studies by Lachar (19741, and Guinn, Johnson, and Kenton<br />
(19751. Bloom (1977, 1980, 19831 reported on the ongoing operational aspects of<br />
the program. The interested reader is referred to Crawford ‘8 (19901 review of<br />
the history of AFMET. This study reports on the efficacy of the instrument used<br />
in the first phase, the History Opinion Inventory (HOI), for predicting 9MT<br />
performance and Character of Separation. In addition, the Gordon Personal<br />
Profile (Gordon) and the Minnesota Multiphasic Personality (MMPI) are discussed<br />
in relationship to BMl’ performance and character of separation.<br />
METHOD<br />
Subjects.<br />
The total sample consisted of all USAF enlisted personnel whose total<br />
Active <strong>Military</strong> Service Date was calendar year 1985 through 1988 and who were<br />
also identified by Wilford Hall USAF Medical Center for testing on the AFMET,<br />
or 171,707 subjects (males = 158,601, females = 33,106). The number of<br />
-------s.------------<br />
1 Disclaimer: The views expressed in this paper are thoae cf the author and do<br />
MJt neceSsarily repreaent thoee of the United States Air Force or the Departffient<br />
Of Defenee. Acknowledgments: The author thanks Melody Darby and Doris Black for<br />
their assistance in statistical analyses, Calvin Fresne for his assistance ln<br />
data management, and Malcolm Ree, PH.D. for his assistance throughout the study.<br />
392
ubjects in each analysis may differ a8 not all subject8 had data on all<br />
variablea.<br />
In8 trumente .<br />
Inetrumente include the HOI, a SO-item, true-false aelf reported history<br />
of legal, antisocial, school, family, and alcohol problem8 with a weighted total<br />
score range of 0 to 30. Higher scores indicate greater endoreement of problems<br />
prior to rrervice, The Gordon ir an 18-item questionnaire in which subjectr murt<br />
choose which is most and least like them. Four scores were obtained: Social<br />
Aacendoncy, Responsibility, Emotional Stability, and Gregariousnees. Scores were<br />
entered as percentiles, ranging from 1 to 00. The MMPI, a measure of<br />
psychopathology, ha8 nine clinical and three validity scales, with raw score8<br />
ranging from one to 58.<br />
Procedure.<br />
The (HOI) wa8 given on the second day of training to all United States<br />
Air Force baeic trainees to identify high risk recruits. The Gordon wa8 given on<br />
the 6-0th day of training, during Phase II testing of identified high risk<br />
recruits. Any recruit referred to a credentialed provider based on phase 11<br />
results was given the MMPI (currently the MMPI-21 prior to a clinical evaluation<br />
by a peychologist or psychiatrist. Only after an evaluation by a p8yChOlOgiSt<br />
was a recruit recommended for discharge or for return to duty.<br />
Analyses include analysis of variance, pooled variance t-test, Pearson<br />
correlation, multiple regression, Cronbach coefficient alpha and the<br />
Wherry-Gaylord estimation of reliability of composites.<br />
RESULTS<br />
History Opinion Inventory.<br />
Reliability as measured by the Wherry-Gaylord procedure for weighted<br />
composite8 was .84. Internal consistency among all the item8 was .57, using<br />
Cronbach’s coefficient alpha. The substantially lower reliability using<br />
Cronbach’s alpha demonstrate that the instrument is multidimensional and the<br />
Wherry-Gaylord is the more appropriate index of reliability.<br />
Table 1 8hOW8 that recruit8 who graduated from BACP had significantly<br />
lower 8core8 on the HOI. than those who were discharged. A correlation analyals,<br />
corrected for unreliability, showed the HOI accounted for 312 lr=.31) of the<br />
predictive efficiency for BMT graduation/discharge.<br />
Character of separation was divided into three groupa: honorable, less<br />
than honorable, and entry level separation. Significant difference8 on the HOI<br />
were found among the types of separation with the entry ievel separation (ELS!<br />
group accounting for the significant difference, a8 seen in Table 2.<br />
A correlation analysis, corrected for unreliability, showed the HOI<br />
accounted for 36% of the predictive efficiency for character of eeparation.<br />
The Qordon Personal Profile Inventory.<br />
Means of the Cfordon rubrcales were significantly different for graduate8<br />
v8 discharges from BMT. Table 1 depicts these result8. As shown in Table 2,<br />
honorable discharge8 had significantly different average scorss on the four<br />
subaca’lea compared to ELS. There wa8 a nonsignificant trend for le88 than<br />
honorable discharge average score8 to be lower than honorable and higher than ELS<br />
fez- all eubrcales, Due to the small number of recruits who have So far taken the<br />
Gordon and received leae than honorable diahcarge (N=I4) 1 these results were not<br />
reported in the t.abie.<br />
393
MEASURE<br />
Table 1<br />
HOI, Gordon, and BMT Performance<br />
MEASURE GRADUATED DISCHARGED F T-TEST<br />
HOI (N=158,671) (N=12,0111<br />
MEAN 3.1702 5.0417<br />
SD 2.804 4.868<br />
GORDON<br />
SOCIAL ASCENDANCY 11=2760)<br />
MEAN 57.0000<br />
SD 32.318<br />
RESPONSIBILITY<br />
MEAN<br />
57.0000<br />
SD 32.430<br />
EMOTIONAL STABILITY<br />
MEAN 46.1513<br />
SD 32.300<br />
(N=O20)<br />
26.2413<br />
20.780<br />
22.5359<br />
26.703<br />
14.5707<br />
23.829<br />
SOCIAL GREGARIOUS<br />
MEAN 51.1224 25.8902<br />
SD 31.755 28.861<br />
*(I( p < -01 **w p ( .OOl<br />
3.01*** il. la***<br />
1.18** 26.63*ww<br />
1.47rur 32.c91**<br />
!.B5rr* 31.64r**<br />
1.21*** 22.39rrw<br />
Table 2<br />
HOI, Gordon, and Character of Separation -.-- .-<br />
HONORABLE LTH’ ELS<br />
F<br />
HOI (N=21,641) iN=6081 (N=15,603)<br />
MEAN 3.42s 3.66* 5.56<br />
SD 3.04 3.08 4.64<br />
GORDON (N=3331<br />
SOCIAL ASCENDANCY<br />
MEAN 52.04w<br />
SD 33.24<br />
RESPONSIBILITY<br />
MEAN 54.02s<br />
SD 32.66<br />
EMOTIONAL STABILITY<br />
MEAN 43.67*<br />
SD 32.54<br />
(N= 1045)<br />
20.63<br />
31.70<br />
25.76<br />
28.98<br />
17.45<br />
26.08<br />
1462.15***<br />
61.06***<br />
112.69***<br />
112.56***<br />
SOCIAL GREGARIOUS<br />
MEAN 45.38~<br />
28.56 37,7o*rr<br />
*<br />
SD 32.81 ____- _ _--. 30.13 .__,____ .__._..._. --_--significantly<br />
different from ELS, p < .OOl it** p < .OOOI<br />
_.-...-<br />
1 These results are not reported for the Gordon because only 14 recruits hew<br />
taken the Gordon and received a Less than Honorable Discharde.<br />
ELS = Entry Level Separation LTH = Less Than Honorable<br />
394
-.- -. -.-- -.<br />
The Minnelrota Multiphreic Inventory (~~11.<br />
Table 5 ahowo the mean8 and standard deviation8 for the validity and<br />
clinical scale8 of thr hMP1 by gender and BMT performance. For maleis, average<br />
differences across all rcales were statistically significant (p.( .OOll and T<br />
profiles were clinically meaningful. For females, there were no significant<br />
differencea on one of the validity indexes, L, or on scale 8, mania. All other<br />
measured indices were significant at the .Ol level.<br />
Table 4 shows the means and standard deviations for the validity and<br />
clinical scalee of the Ml&!1 by gender and character of separation. For malee only<br />
Scales 8, L, and K did not significantly distinguish between ELS and honorable<br />
discharge ( p. < 0011. For females, there were no significant differences among<br />
Table 3<br />
MMPI and BMT Performance<br />
SCALE GRADUATED DISCHARGED T GRADUATED DISCHARGED T<br />
_,<br />
(N=7341 (N=6881 (N= 102) (N=I 181<br />
L MEAN 4.35 3.62<br />
SD 2.31 2.50<br />
F MEAN 10.20 17.24<br />
SD 7.00 9.49<br />
K MEAN<br />
SD<br />
11.63<br />
4.07<br />
9.93<br />
3.97<br />
HS MEAN 9.86 15.62<br />
SD 6*80 7.52<br />
D MEAN 24.15 31.20<br />
SD 7.61 7.74<br />
Hy MEAN 21.79 27.08<br />
SD 5.98 6.58<br />
Pd MEAN 23.43 27.37<br />
SD 6.11 6.05<br />
Mf MEAN 24.96 27.18<br />
SD 5.11 5.02<br />
Pa MEAN 13.36 17.34<br />
SD 5.28 5.65<br />
Pt MEAN 22.52 31.39<br />
SD 11.76 10.45<br />
SC MEAN 23.48 35.15<br />
SD 13.55 14.38<br />
Ma MEAN 21.32 22.42<br />
SD 4.72 5.40<br />
5.9a** 4.06 3.56<br />
2.29 2.23<br />
15.83** 9.08 15.64<br />
6.06 9.23<br />
7.24** 11.92<br />
5.01<br />
10.15<br />
4.39<br />
15.11** 11.21 17.44<br />
6.78 7.82<br />
17.31** 35.46 32.99<br />
6.16 7.99<br />
15.83** 24.20 29.42<br />
5.68 6.67<br />
12.25** 24.28 27.31<br />
5.88 6.41<br />
8.27** 34.94<br />
5.15<br />
36.92<br />
4.53<br />
13.69** 13.11 16.34<br />
5.07 4.98<br />
15.07** 24.03 31.75<br />
11.45 10.63<br />
15.72** 24.33 33.76<br />
12.41 15.20<br />
4.09** 21.24 21.46<br />
4.65 4.50<br />
1.64<br />
6.30**<br />
2.76*<br />
6.33**<br />
7.88**<br />
6.28**<br />
3.66**<br />
3.01*<br />
4.75**<br />
5.16**<br />
5.06**<br />
Si MEAN 32.78 41.95 13.33** 32.74 42.61 5.71**<br />
SD<br />
* p < .Ol<br />
13.29<br />
** p < .OOl<br />
12.64 12.20 13.44- - - -<br />
395<br />
0.36
SCALE<br />
L MEAN<br />
SD<br />
F MEAN<br />
SD<br />
K MEAN<br />
SD<br />
He MEAN<br />
SD<br />
D MEAN<br />
SD<br />
Hy MEAN<br />
SD<br />
Pd MEAN<br />
SD<br />
Mf MEAN<br />
SD<br />
P a MEAN<br />
SD<br />
Pt MEAN<br />
SD<br />
SC MEAN<br />
SD<br />
Ma MEAN<br />
SD<br />
Si MEAN<br />
SD<br />
.-.-- .-_-<br />
Table 4<br />
hMP1 and Character of Separation -<br />
KALES<br />
HONORABLE ELS F<br />
IN=1451 (N=735)<br />
4.2897 5.6857 4.43<br />
2.1243 2.2646<br />
10.6138 16.8395 28.7291**<br />
6.7373 9.4024<br />
11.2690 10.0027 6.5935@<br />
4.6685 4.0420<br />
9.0552 15.3537 45.3225**<br />
6.0882 7.5637<br />
23.2759 30.8204 57.9092**<br />
6.7757 7.0753<br />
21.3241 26.8381 44.9178*#<br />
5.7275 6.5806<br />
23.4897 27.2327 23.6745~~<br />
5.6349 6.0954<br />
24.3862 27.0544 17.4523~~<br />
4.9261 5.1062<br />
12.8552 17.1537 35.4004**<br />
5.4250 5.6698<br />
21.1034 30.8762 50.8987**<br />
10.5795 10.7211<br />
22.6207 34.6014 43.2064*#<br />
12.7247 14.5042<br />
21.5793 22.4503 1.8347<br />
4.5760 5.3862<br />
32.8062 41.3429 28.6511**<br />
11,4038 12.9246<br />
Y = p < .Ol ** = p ( .OOOl<br />
396<br />
FEMALES<br />
HONORABLE ELS F<br />
(N=26) (N=126)<br />
3.9231 3.5317 0.39<br />
1.9167 2.2043<br />
10.0385 i5.2063 4.7042<br />
5.0713 9.1933<br />
10.4231 10.2063 1.7106<br />
3.4195 4.3567<br />
11.5385 17.1905 6.3549*<br />
5.7218 7.8798<br />
26.5000 32.5873 6.9555W<br />
5.4498 8.0987<br />
24.1538 29.2540 6.766CW<br />
5.7667 6.7539<br />
25.6538 27.0079 2.2758<br />
4.0094 6.5976<br />
35.4231 36.8254 1.0651<br />
4.7428 4.5274<br />
12.8462 16.2063 5.6570~<br />
4.5316 5.0045<br />
25.7692 31.4444 4.3895<br />
10.0332 10.7804<br />
25.5385 33.3571 4.0622<br />
11.0099 15.1385<br />
20.6923 21.4762 1.3661<br />
3.6306 4.5532<br />
36.8846 41.9683 2.0088<br />
10.8271 13.6510
the scales based on character of separation. As only eleven males and one female<br />
who had taken the MMPI had received a less than honorable discharge, this category<br />
was not included in the analysis.<br />
CONCLUSIONS<br />
It is concluded that the HOI as the first part of a psychiatric screening<br />
inventory to predict BMT performance is both reliable and valid. It also predicts<br />
character of separation, effectively contrasting those who receive entry level<br />
separations from those who are honorably discharged or those who are less than<br />
honorably discharged.<br />
Current research on the AFMET will determine the predictive validity,<br />
reliability, and clinical meaningfulness of all aspects of AFMET In<br />
relationship to Basic <strong>Military</strong> Training, technical school performance,<br />
unfavorable information, eligibility for promotion, and character of<br />
separation. Based on these findings the AFMET will be revised and refined to<br />
increase predictive and clinical efficacy.<br />
REFERENCES<br />
Bloom, W. (1977) Air Force Medical Evaluation Tests<br />
Digest, 2&, 17-20.<br />
. USAF Medical Service<br />
Bloom, W. (IQ801 Air Force Medical Evaluation Test (AFMETl Identifies<br />
Psychological Problems Early. USAF Medical Service Digest, 31, 8-Q.<br />
Bloom, W. (19851. Changes made, lessons learned after mental health<br />
screening. <strong>Military</strong> Medicine, 148, 889-890.<br />
Butters, M., Retzlaff, P., & ffibertini, M. (19861. Non-adaptibility to basic<br />
training and the Millon Clinical Multiaxial Inventory. Mi 1 i tary<br />
Medicine, 151, 574-576.<br />
Crawford, L. (19901. Development and Current Status of USAF Mental Health<br />
Screening. Manuscript submitted for publication.<br />
Ctuinn, N., Johnson, A., & Kenton, J. (19751. Screening for Adaptability to<br />
<strong>Military</strong> Service (AFHRL-TR-75-301, Brooks AFB, TX: Training Systems<br />
Division, Air Force Human Resources Laboratory.<br />
Laabs, Q., Trent, T., & Quenette, M. (1989). The adaptability screening<br />
program: an overview. Proceedings of the 318t Annual Conference of the<br />
<strong>Military</strong> <strong>Testing</strong> <strong>Association</strong>, 434-439.<br />
McCraw, R., & Bearden, D. (19881. Motivational and demographic factors in<br />
failure to adapt to the military. <strong>Military</strong> Medicine, 6, 325-328.<br />
Spielberger, C. & Barker, L. (19791. The Relationship of Personality<br />
Characteristics to Attrition and Performance Problems of Navy and Air<br />
Force Recruits (Contract No. MDA 903-77-C-0190). Orlando, FL: US Navy<br />
Training Analysis and Evaluation Uroup.<br />
Trent, T, Quenette, M., & Laabs, cf. (1990, August). An Alternative to High<br />
School Diploma for <strong>Military</strong> Enlistment Qualification. Paper Presented at<br />
the Q8th Annual Convention of th American Psychological <strong>Association</strong>,<br />
Boston, MA.<br />
397
Implementation of the Adaptability Screening Profile (ASP)*<br />
Thomas Trent, Mary A. Quenette, and Gerald J. Laabs<br />
<strong>Testing</strong> Systems Department<br />
Navy Personnel Research & Development Cente?<br />
San Diego, California<br />
At last year’s MTA symposium concerning the implementation of a biographical<br />
instrument (Adaptability Screening Profile/ASP) into military enlistment screening (Sellman,<br />
1989), we described technical issues (Trent, 1989), data analysis plans (Waters & Dempsey,<br />
1989), a methodology for controlling item response distortion (Hanson, Hallam & Hough,<br />
1989), and plans for accelerated implementation (Laabs, Trent & Quenette, 1989). While we<br />
made considerable progress towards these stated goals, the operational start and field test o.f<br />
the ASP has been postponed while the Armed Services review implementation options. This<br />
paper summarizes ASP objectives and updates the research results. In addition, unresolved<br />
implementation issues and preliminary plans for the development of a new Department of<br />
Defense (DOD) enlistment screening algorithm are described.<br />
The Problem Revisited<br />
Since WorId War II, the Services manpower and personnel research laboratories have<br />
conducted research on a variety of biographical and other noncognitive assessments for<br />
personnel screening (Laurence & Means, 1985). Nonetheless, the quota restriction that the<br />
Services place on the proportion of non high school graduates has operated as the primary<br />
attrition controlling screen. As an increasing number of high school “dropouts” earn<br />
alternative education credentials (e.g., adult school, high school equivalency certificate,<br />
certificates of attendance, and occupational programs), the U.S. Congress and advocacy groups,<br />
such as the American Council on Education, have requested DOD to augment educational<br />
enlistment criteria with a screening instrument that measures attributes of the individual<br />
applicant that are related to adaptation to military life and the probability of completing initial<br />
obligated service.<br />
Opposition to basing enlistment eligibility on educational group membership has<br />
intensified since a 1987/1988 DOD classification of educational credentials into three eligibility<br />
tiers. Table 1 shows that attrition during. the first year of enlistment varies considerably across<br />
and within the tiers by type of education credential. While Tier I applicants are given highest<br />
priority for enlistment3, the attrition rates for adult schoolers (23.6%) and recruits with one<br />
‘Paper presented at the 32nd Annual Conference of the <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong> at Orange<br />
Beach, Alabama, November, 1990.<br />
‘The opinions expressed in this paper are those of the authors, are not ofliciai, and do not<br />
necessarily represent those of the Navy Department.<br />
‘The relutively small numbers of Tier II & Tier III non high school graduate applicants who arc<br />
selected must also score considerably higher on the Armed Services Vocational Aptitude Buttcrv.<br />
398<br />
.
school diploma graduates (10,6%, 14.3%, and 13.5%. respectivelY).<br />
Procedures<br />
Table 1<br />
Twelve Month Attrition Rates by Education Lcvcl<br />
DOD Fiscal Year 1988 Accessions’<br />
Tie&ducation Lcvcl<br />
Tier I<br />
Hig$?%ool Graduate<br />
College<br />
One Semester<br />
2 Yrs or more<br />
Adult Education<br />
Tier II<br />
H.S. Equivalence Certificate<br />
Oct. Program Certificate<br />
H.S. Certificate of Attendance<br />
or Completion<br />
Correspondence<br />
Home Study<br />
Number of Percent<br />
Accessions Attrition<br />
235,388 13.5<br />
2,092‘ 21.1<br />
6,228 8.0<br />
275 23.6<br />
9,843 23.8<br />
98 14.3<br />
1,018 19.8<br />
87 24.1<br />
47 10.6<br />
Tier III<br />
No H.S. Diploma 5,350 26.6<br />
‘Non-prior service, active duty, E = 260,426.<br />
Two alternate forms of the ASP (Part 1) were developed, each consisting of SO items<br />
in multiple choice format with two to five response options. The items sampled construcIs<br />
representing delinquency, ‘academic achievement, career and work orientation, athletic<br />
involvement, and social adaptation. Item option scoring weights were developed utilizing<br />
Guion’s (1965) “horizontal percent” method in a randomly assigned scale construction sample<br />
(N = 26,857, Army, Navy, Air Force, and Marine Corps combined samples). This resulted<br />
in a three-point item scale and a single total summed score. In a national sample of military<br />
applicants (N = 120,175), the mean item reliability (item to rota1 score correlation) was .2 1<br />
and the estimates of internal consistency were .76 and .74 (coefficient alpha for the two<br />
forms). The predictive validity of the ASP was compared to the folIowing measures: Armed<br />
Forces Qualification Test (AFQT); education credentials (2 years college, high school diploma,<br />
high school equivalency certificate/GED, and no secondary credential); employed at time of<br />
application; 17 years of age at time of service entry; and eligibility waiver status as a result<br />
of preservice misdemeanor or felony arrests.<br />
The criterion is a dichotomous measure of attrition. Personnel who were voluntarily<br />
or involuntafily dishaged from serbice prior to the completion of their service contracts were<br />
coded ” 1”. Those personnel with medical disability, officer school discharges, service breach<br />
of contract, and the dead were excluded from analysis. All other personnel were coded as “0”.<br />
399<br />
.
The criterion is a dichotomous measure of attrition. Personnel who were voluntarily<br />
or involuntarily discharged from service prior to the completion of their service contracts were<br />
coded “I “. Those personnel with medical disability, officer schoo! discharges, service breach<br />
of contract, and the dead were excluded from analysis. All other personnel were coded as “0”.<br />
The biodata instrument (ASP-I) was administered to all active duty military applicants<br />
in the United States for a three month period (N = 120,175). The sample utilized in the<br />
following analyses consisted of 55,675 personnel who enlisted after the applicant<br />
administration. The applicant and accession samples were generally representative of military<br />
populations (Trent, Quenette, Ward & Laabs, 1990).<br />
Results<br />
points.<br />
Figure 1 graphically portrays average attrition rates at each of the biodata raw score<br />
I<br />
I<br />
0<br />
LI.,, ,,,,, ,__ .,,,,:,l,,,,,*,,,,,,,,,,,l,,,/,,jl<br />
m Do m 100 tm 110 11, 1-m tra<br />
ASAP Raw Score<br />
130 tl5<br />
Ftgure 1. Attrition rates by ASP-l score<br />
Table 2 shows the simple and incremental validities with the biodata score (ASP-l)<br />
forced into the regression equation last. This analysis was performed on a random one-half<br />
of the sample (“model construction” group; E. = 26,991). Aside from ASP-l and AFQT, the<br />
predictor variables were dummy coded. Validities for high school diploma, two or more years<br />
of college, AFQT, age 17, no credential, GED, and ASP were corrected for restriction of range<br />
using a univariate formula (Thorndike, 1982). Validities for employment status and<br />
misdemeanor/felony were not corrected because operational selection procedures resukd in<br />
larger accession sample variances as compared to applicant sample variances.<br />
The trJe<br />
unrestricted variance of the misdemeanor/felony measure is unknown since most potential<br />
applicants in this category are screened out at the recruiter level and do not reach the applicant<br />
testing stage.<br />
400<br />
.
criterion.(-.09).<br />
Variabk?<br />
Table 2<br />
ASP-l Incremcntnl Validity - DoD Sample’<br />
Zcro- Incremental<br />
order Multiple Change<br />
I” r,’ R R2 F P<br />
Srepc<br />
1. HS Diploma -.14 -.I9 .14 .021 565.0 .OOO<br />
2. 2 Years College -.03 -.04 .17 .030 272.9 .ooo<br />
3. Employed -.09 .19 .O36 152.5 .ooo<br />
4. AFQT Pcrccntile -.06 -.07 .20 .039 92.8 .ooo<br />
5. MisdemcanorlFclony .04 .20 440 25.3 .ooo<br />
6. No Credential .I3 .17 .20 44 1 23.1 .ooo<br />
7. GED .09 .lO .20 .041 13.8 .ooo<br />
8. Age 17 .05 .07 .20 .042 9.2 .002<br />
9. ASP-l -.25 -.27 .27 ,073 912.0 .ooo<br />
‘DOD Accessions, Model Construction Group, B = 26,991.<br />
bAIl predictor variables arc indicator variables (dummy O/I coded)<br />
cxccpt ASP-l and AFQT scores.<br />
“All correlations are significant at .05 Icvcl.’<br />
dCorrclations (validities) corrected for restriction of rang (univariatc<br />
correction; Thorndike, 1982)<br />
‘Order of entry of variables in steps 1-8 was dctcrmincd by prior stcpwisc<br />
proccdurc. ASP-l was forced into the equation last.<br />
Conclusions and Implementation Issues<br />
In the research mode, the use of the Adaptability Screening Profile for enlistment<br />
screening demonstrated incremental validity in addition to operational screens and other<br />
potential measures to minimize attrition and to improve the match between the demands of<br />
military service and the background and temperament of individuals. The utility of employing<br />
the ASP will vary as a function of the selection ratio* and the stability of the ASP in<br />
operational mode (see Trent, et al. 1990 for a more complete discussion of ASP utility).<br />
The research results support the contention of the American Council on Education that<br />
alternatives to the existing three-tier educational quota system are technically feasible. On<br />
.the other hand, educational attainment has a proven track record of good predictive validity<br />
and is in fact one of the most reliable of the biographical measures. From a technical<br />
perspective, type of education credential should be included in an array of adaptability<br />
indicators that samples the “whole person.” The approach of the ASP research program has<br />
been to operationalize constructs related to individuals’ adaptability to institutions in general<br />
and the likelihood of persistence in military trainin, (7 and occupations in particular. The<br />
biodata score resulting from the ASP is an economical method of capturing personal<br />
background data. In addition, a new research effort is underway at the Navy Personnel<br />
‘The proportion of qualified recrl& needed to meet manpower goals to the foul numlxr of’<br />
military applicants.<br />
401<br />
.
Research and Development Center and the Human Resources Research Organization to<br />
construct a DOD attrition prediction model that could be used in a “compensatory” enlistment<br />
eligibility system (Laurence & Gribben, 1990). In such an algorithm the applicant’s qualifying<br />
score would be determined by a combination of measures such as aptitude test scores and<br />
personal background data, including educational achievement, criminal justice history, and<br />
employment history. The validity of this proposed screening model, as well as plans for DoD<br />
implementation, is planned for presentation at next year’s MTA conference.<br />
Two related issues have stalled the field rest of the ASP. In that the principal objective<br />
of the operational test was to evaluate the performance of the self-reported biodata in an<br />
operational mode, eligibility cutting scores were established to eliminate the bottom 10 percent<br />
of otherwise qualified applicants. This was a necessary condition to gain a realistic<br />
environment of recruiter coaching and applicant dissimulation to test for operational score<br />
inflation and possible validity degradation. The prospect of rejecting high school diploma<br />
graduates, especially in the upper “mental groups,” proved to be extremely unpopular among<br />
the Services. Secondly, the DOD is considering the feasibility of avoiding the “multiple<br />
hurdle” impact of the ASP field test by implementing the instrument within the new<br />
compensatory screening algorithm that is under development. Thus, the initial efficacy of the<br />
ASP would rely upon validity estimates from the non-operational administration (E = 120,175).<br />
Until score monitoring provides operational data, the uncertainty about the impact of recruiter<br />
coaching and applicant “faking good” on score distributions and predictive validity will remain<br />
unresolved. At present, the ASP relies upon empirical scoring and verification warning<br />
statements to minimize score inflation. Moreover, experimental studies (e.g., Trent, Atwater<br />
& Abrahams, 1986; Trent, 1987; Hough, Eaton, Dunnette, Kamp & McCloy, 1990) indicate<br />
that the problem of item response distortion is minimal. That is, applicants’ responses do not<br />
demonstrate extreme distortion and validities of biodata instruments are not seriously moderated<br />
by distortion.<br />
REFERENCES<br />
Guion, R. M. (1965). Personnel testing. New York: McGraw-Hill.<br />
Hanson, M. A., Hallam, G. L., & Hough, L. M. (1989, November). Detection of response<br />
distortion in the Adaptability Screening Profile (ASP). Paper presented to the 31st Annual<br />
Conference of the <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong>, San Antonio, Texas.<br />
Hough, L. M., Eaton, N. K., Dunnette, M. D., Kamp, J. D., & McCloy, R. A. (1990).<br />
Criterion-related validities of personality constructs and the effect of response distortion on<br />
those validities. Journal of Applied Psychology, 75 (5).<br />
Laabs, G. J., Trent, T., & Quenette, M. A. (1989, November). The Adaptability Screeninp<br />
Pro.gram: An Overview. Paper presented at the 31st Annual Conference of the Militaq<br />
<strong>Testing</strong> <strong>Association</strong>, San Antonio, Texas.<br />
Laurence, J. H., & Means, B. (1985, July). A description and comparison of biographical<br />
inventories for military selection. (FR-PRD-85-S). Alexandria. VA: Human Resources<br />
Research Organization.<br />
402
Laurence, J. H., & Gribben, M. A. (1990, July). Militarv selection strateeies (FR-PRD-90-<br />
15). Alexandria, VA: Human Resources Research Organization.<br />
Sellman, W. S. (1989, November). Implementation of biodata into militarv enlistment<br />
screening. Symposium presented at the 31st Annual Conference of the <strong>Military</strong> <strong>Testing</strong><br />
<strong>Association</strong>, San Antonio, Texas.<br />
Thorndike, R. L. (1982). Anplied psvchometrics. Boston, MA: Houghton-Mifflin Company.<br />
Trent, T. (1987, August). Armed forces adaptabilitv screening: The problem of item response<br />
distortion. Paper presented at the American Psychological <strong>Association</strong> Convention, New<br />
York, NY.<br />
Trent, T. (1989, November). The Adaptability Screening profile: Technical Issues. Paper<br />
presented at the 31st Annual Conference of the <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong>, San Antonio,<br />
Texas.<br />
Trent, T., Atwater, D. C., & Abrahams, N. M. (1986, April). Experimental assessment of item<br />
response distortion. In Proceedings of the Tenth Psycholonv in the DOD Symposium.<br />
Colorado Springs, CO: U.S. Air Force Academy.<br />
Trent, T., Quenette, M. A., Ward, D. G., & Laabs, G. J. (1990). Armed Service Applicant<br />
Profile (ASAP): Development and validation (in review). San Diego, CA: Navy Personnel<br />
Research and Development Center.<br />
Waters, B. K., & Dempsey, J. R. (1989, November). Development of the Adaptability<br />
Screeninp Profile score monitoring svstem. Paper presented to the 31st Annual Conference<br />
of the <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong>, San Antonio, Texas.<br />
403<br />
.
FOR U.S. NaVY TYF’I NG F’EHFOI~MANCE TESTS<br />
MASTER CHX EF YEOMAN STEVE D. MCGEE, USN<br />
NAV;,;L EDUCAT I ON TRA IN X NG PI:‘;OGF\‘AM MANAGEMENT SUPF’OHT ACT IV I TY<br />
The Department of the Navy is considering the use of: word<br />
processnrs/personal computers in typing per+ormance tests.<br />
F’j--e-ajpf~t 1 y .tp,ese te%sts are accomplished LttiliZing elect.ric<br />
kypewri ,ters. l-hi LL~ report presents resul. ts of a stt.ldy to<br />
determi ni? the f easi bi 1 ity of using word pr-ocessor-~/persi~rlal.<br />
computers versL1s the electric typewriter for typing perf:or inance<br />
tests.<br />
Purpose<br />
The put-pose of the study is to determine if typing<br />
perfc3rmanc:e tests could be pcrf ormed with word<br />
proc~~~,c.)r-~~/persc~nal computers thereby speeding up word product i on<br />
AS We1 1. &5; ~~l:C:Uri+.Cy.<br />
MethodoX oc~y<br />
SLI~.I j cct i) wet-k2 J51’3 en1 i r;ted U. S. Navy personnel (f_-1 tf.,r-L, E:-..<br />
6) within the administrative and supply communities that raqulre<br />
typing per+orrri~knc:e tests. Sub jtllct:!s were rcindomly se1 ected from<br />
throughout the Navy. Al 1 0.1: t:l~e sub jectc-; haci prior key board<br />
experience on the typewriter and 2t5 subjects had experience with<br />
the word proce~sor/per~onal computer in thei r normal day-,to-day<br />
work .<br />
Twa c;fficj.~~l U.S. Navy typing performance tesis cprtict.:!cil)<br />
were used +r-urn series 87 published by NETF’MSA. The standard<br />
t?li?ctric:: typewriter (IBII Selectric, Selectric II, and Selectric<br />
III) were utilized +or the typewriter portion o-i: both exams krhile<br />
‘ t h e wet-cJ C‘,~ocl:t?ssar/persi:)~1al ccJlTlpL~tt?r porti cjr-1 Of the fi:.:Jms kiel"k<br />
iidmi r1.i st.ered usi ug the XEEOX 13621, IBM FC, WANG F’C, ZENITH 245,<br />
i-c I1 cl cf P T .<br />
On day one, the subjects were admini stered Tes,t A and timed<br />
for five minutes Losing the typewriter-. On the same day, they<br />
were adrrlirrlc,t.ered Test. A and .ti med fur. five minutes Losing thp<br />
word processor/personal computer - Or\ day two, procedureci wer-e<br />
r’evorse, for e:.:ample, the subjects were ,xdmi r,i c,,l:ered ‘rest I? and<br />
timed +ot- five minutes using the word processor/per~onaL<br />
C~OillpL.lti?r~. They khen were tested using the typewrltet-. Scot-i nq<br />
hiis dcjrit? b y 1. ine at a rate r:,t: f ~.ve 1::ey stroI-::es per word wl th t71tiCk<br />
c- r r 0 r 5., I .t ti 5; ‘t. r t* c .t i r‘l g 5 (?I 1:: e y 5 t. r (3 1.:: c? 5 i: r 08-n t he .t 13 t a 1 5 t r- 0 I:. ~2 !is .<br />
_rm .>*.; .-.. L_-..r_l._ ,_.__.
Subjects were al lowed to use the automat i c wrap--around (autcmat1c<br />
return 1 and backspace ieatures o n the word processor/per~~onal<br />
computer. Al 1 tests were propel-1 y mC2n.i tor-ed by local command<br />
5;c.cper vi 5tr~:, .<br />
Rl.?sul t.s<br />
The results of the administered tests showed that the.<br />
.3Vet-‘age &Or-d pet- minute (WF’M) production was 42.5 using a word<br />
proce:2sor/personal computer and 35.5 WPM using a11 electric<br />
typewri t.er. Table 1 is an illustration of these results by T.tiat .,<br />
A and Test E. fi repeated measurec3 analysis of VariNiCe ruas<br />
cor~duct.ed on the word processor/per5onal computer and typowri ter<br />
words per minute data. Post-hoc test 011 the data indicate thhtr<br />
(a) for- test A the subjects perfor’mecl si yni ,+i cantly better on the<br />
word processor/personal computer than on the electric typewriter,<br />
(b) for test B the subjects performed si gni+i cant1 y bettar on the<br />
word pl-o(_e~j.~or/persollal computer than on the el ectr i c typewr i ter ,<br />
!c) f o r either test sub ject:i performed equal 1 y wel 1 using the<br />
word processor/personal computer ‘I and (d) using the typewriter<br />
subjects pet-f armed signi+icantly bet.ter o n test E -than on tecjt ii.<br />
Thi 5 is i 11 ustrGttrd at Table 2 Showing a breakdown by paygrade.<br />
The test rv:sul ts demonstrate that productivity is increased<br />
b y an aver&ge of 7 WPM using a word proces*sor/personal computer.<br />
Therefore, it would prove advantageous .For the U.S. Ni3Vy t0 allOW<br />
the word PrcjcC5sCjr/Per~onal computer to be uz:,ed for typing<br />
~erfOrmGnCS tests. Additionaly, it id3 recommended that the word<br />
per rrli rlute rf2qui rement be i rrc::rrased by .2X fur the Yeoman rating<br />
since they are t-he individuals who accomplish the rnhjority of the<br />
Navy’ 5 text typing as opposed to form typing. Furthermore, if<br />
the word production for the Yeoman rating is increased by .2X,<br />
then the wrap-around and backspace features Should be authorized<br />
since ti-tics 1s a f eatut-e tf~at i c; uti 1 i zed 01-t a day--to-day basi!s by<br />
t.hese typists.<br />
405
--- --.----- _.-.. ..-- . ..-_ ---__ ---.<br />
”<br />
i 2..-’ ,:. i ,..z ..,. i’ ,,..” f” .,.. / ,.i” ,,,X ,,,. .” ,,.i’ ..,. .“’ ,,.” /’ ,,.. .’ :” ,, .” ,,.<br />
).. ;;.. .,,,.i ,#. .,.’ -.” ,.: .?’ *..’ -..’ _.: . . . ..’ _... : ;.’ ,d .’ i _’ ”<br />
t ;,. -,,, ,;. .,.. .- ;.,.’ ,...’ . ..‘i ,...’ /’ ;’ ,,.. .,- ,...’ . . . . . . . . . ..’ ,..” I’- :’ :’ ;.-:’<br />
.i<br />
y /. / /. .’ . ..’ / ,,/.l ....‘,. _/ /’ ,,,.” ,;.. .,’ ,.,.. i’ ;-:.‘. ,.: :’ ,.‘.. .. ,; ? : ” ,/. i _,.<br />
t, /‘/..I /....‘~y; /. (,.... ./ / .’ ,... 2 ,/” . ...” ,... .. ‘. .,,. .. ‘. . . . .” ..f ;.l ” .” ,_.: .., ,, ., ,... .j ,,,_ ..: .. ,,_ ,. . . . . ;; ,.: ” ,,<br />
/’ ,,.-’ ;,. ,’ ;.-,’ ,.i .,.- .,i .’ ..’<br />
,. /. / ,,.. ,.. ,,. . . : : i . ...’ .’ / ,.’ ,. -. ..f<br />
;.i ,, .’ ,,. ,_.’ ,: ,,. ,.. . ..’ ,:’ ,;.’ (...’ ..’ .’ ‘. ,_. : .’ ; .’<br />
i ‘.., .., . ..,,, ‘... ‘... “... ‘.,, ‘..,, ‘\ “... “.*, “.. ‘..., ‘, ..,. ” ..,.<br />
. . . . .,, . . . . ‘., ‘.., . . ‘..<br />
., ., ‘.. ‘. ‘, ..,, ‘.., x., ‘.<br />
i ‘. ..,, ‘. . . . . ., ‘,, x., ‘..,, ‘..,, . \ % ..,, ‘.,, . . . . . ‘\; ‘. .,,,<br />
“..,, ‘... ‘.,, 1.<br />
. .<br />
‘.. ‘... c / .,<br />
p., ‘L.<br />
‘.., . . . . ‘.. %. . . . . . ‘.<br />
i . . . . i., . . . . . . . i., % ‘.,. . ‘\ -. ‘.. x. L ‘,.. . . . . ., .( ., “., .‘.,, ‘.-.,<br />
: %. “..., ‘. . . . . -. .\ ‘. ..,, ..,, ‘. ‘.., ‘. . ..., ‘5 ,, .‘..,, ‘... ‘...,<br />
‘i.<br />
,,<br />
. . . . ‘. ..,, .,., .., .,, .., . . . . ‘... ‘. ..(,<br />
‘, . . .<br />
I. ‘, ..,. ..,_ “.. . .-., k. ‘., -.., .,,, .., ‘.. “..,. ‘5 x., ‘l,, ‘L., 4, I ., c. . . ‘.,, .,, ._<br />
-.. ‘k., ‘-., . . ..(<br />
! .,,, ._, . . . %..<br />
‘..,<br />
i.... ‘...., ‘?. ‘.., ‘... “... ‘x ‘.., ‘...<br />
‘... . . . . “... .., ‘..., ‘.. “.. ‘. ._(, ..,, ‘.._.,<br />
! ‘..,, ‘i.., “.... ,, ‘..<br />
“.. “.., ‘.., . ‘...., ‘. . . . . ‘., ._ . . ‘.. . . ‘. .;. x., ‘... i<br />
“..., ‘. ..,, .., ‘..,<br />
‘...<br />
‘. ‘..,<br />
..,, .; ‘.. ‘%. ‘..,<br />
;. ‘. .,.., ‘. ..; . . . . i . . . . . . .<br />
‘k; ‘X., .%.. . . . . . . . . . .<br />
j ‘;, ., ‘.. .,<br />
‘. ..,, .. \, 5 c., ..,<br />
‘XC<br />
.,, “‘. x., x.. ‘.., . ,, . >. ‘.... . . ‘; . . . s_ %., . . . . k., :..,<br />
‘., ._ ‘L, ,, ., .‘..<br />
406
--<br />
I
Acute High Altitude Exposure and Exercise<br />
Decrease Marksmanship Accuracy<br />
W.J. Tharion, B.E. Marlowe, R. Kittredge,<br />
R. Hoyt and A. Cymerman<br />
United States Army Research Institute of Environmental Medicine<br />
Natick, Massachusetts 01760<br />
ABSTRACT<br />
Many moderate to high altitude areas occupy militarily .<br />
strategic parts of the world. This study quantified the<br />
effects of endurance exercise, acute altitude exposure (AAE)<br />
and extended altitude exposure (EAE) (16 days at 4300 m), on<br />
marksmanship performance. Sixteen experienced male marksmen<br />
fired a de-militarized M-16 rifle equipped with a Noptel ST-<br />
1000 laser system from a standing unsupported position at a<br />
2.3 cm diameter circular target from a distance of 5 m.<br />
Subjects were tested at rest and after a maximal 20.4 km<br />
run/walk ascent from 1800 m to 4300 m, following AAE and EAE.<br />
Sighting time (the interval between a signal light to fire and<br />
trigger pull) and accuracy (distance of shot impact from<br />
target center) were measured. Exercise and time at altitude<br />
had independent effects on marksmanship. Sighting time was<br />
unaffected by exercise, but was 8% longer following EAE (5.61<br />
+ 1.25 set AAE vs 6.06 ?I 1.06 set EAE (mean i SD), ~C.05).<br />
Accuracy was reduced 11% by exercise (3.63 ?: 0.69 cm at rest<br />
vs 4.01 + 0.89 cm post exercise, ~
-~-- -___- -<br />
Subjects Sixteen soldiers, 18-39 years of age, volunteered for the study.<br />
Subjects were not from nor had they lived during the three months prior<br />
to the study at altitudes greater than 1500 m. All subjects were<br />
experienced marksmen prior to study participation.<br />
Eouipment<br />
Marksmanship performance was quantified with a Noptel ST-1000 (Oulu,<br />
Finland). laser marksmanship system. The system consists of a laser<br />
transmitter attached to a de-militarized M-16 rifle, a laser switch, an<br />
optical target, a personal computer, printer, and software provided by<br />
, Noptel.<br />
TAELE 1. TESTING SCHEDULE FOR WARKWSHIP kdEF&uRES.<br />
DAYS 1-5 SEA LEVEL<br />
Days I-4 Marksmanship Training<br />
Day 5 Marksmanship Assessment<br />
DAYS 6-23 4300 M ALTITUDE<br />
Day 6 Marksmanship Assessment, Acute Altitude Exposure, Fatigued State<br />
Days 'l-9 Marksmanship Assessment, Acute Altitude Exposure, Rested State<br />
Days lo-19 No <strong>Testing</strong><br />
Days 20-22 Marksmanship Assessment, Extended Altitude Exposure, Rested State<br />
Day 23 Marksmanship Assessment, Extended Altitude Exposure, Fatigued State<br />
Procedure<br />
The schedule of testing is shown in Table 1. On Day 6, subjects<br />
ascended (2500 m vertical ascent) 21 kmto the summit of Pikes Peak (4300<br />
m) as quickly as possible. Within 5 minutes upon completion of the<br />
ascent marksmanship was assessed. Subjects then resided for 16 days at<br />
the summit. On Day 23, subjects were returned to the base of Pikes Peak<br />
for a second ascent and subsequent marksmanship assessment. Each<br />
marksmanship test consisted of a total of 20 shots. Subjects were<br />
instructed to shoot at will for the first ten shots to obtain the best<br />
accuracy score possible. 'For the second ten shots, subjects were<br />
instructed to shoot as fast as possible without sacrificing accuracy<br />
(speed and accuracy). During the latter assessment, subjects were<br />
required to hold the barrel of the rifle below their waist. Following<br />
a verbal ready signal and a l-10 set randomly-varied preparatory<br />
interval, subjects were signalled to shoot upon illumination of a red<br />
stimulus light. Subjects shot in the free standing unsupported position<br />
from's distance of 5 m at a 2.3 cm diameter circular target.<br />
RESULTS<br />
A significant effect of altitude condition was observed for distance<br />
409<br />
.
__ _________..-.-...<br />
._ - -___. - ----..<br />
from center of mass (DCM) Q5.03). Post-hoc t-test analysis revealed<br />
that DCM for the accuracy-only test was greater @
The effects of both altitude exposure and fatigue on the various<br />
marksmanship parameters are summarized in Table 2. When shooting<br />
exclusively for accuracy, significant differences assessed via ANOVA<br />
existed for DCM &.Ol) and shot group tightness (SGT) &.02). Acute<br />
altitude exposure elicited a greater DCM and a more dispersed shot group<br />
than at sea level or after extended altitude exposure. When shooting for<br />
both speed and accuracy, DCM (e
_^,--_ ---.<br />
,<br />
at altitude, pi;;;: shooters als;ffired more quickly but less accurately.<br />
He suggests feelings sickness and increased physical<br />
symptomatology (acute mountain sickness) experienced in the first few<br />
days of altitude exposure lead to lowered motivation to perform well,<br />
presumably because of one's preoccupation with bodily discomfort. It is<br />
also possible that subjects become impatient trying to maintain a good<br />
aiming point with increased body sway encountered at altitude (Fraser,<br />
Eastman, Paul and Porlier, 1987). They may then shoot prematurely,<br />
resulting in the decrease in sighting time. It is speculated that<br />
subjects may feel that taking additional sighting time would not improve<br />
their accuracy. Another possibility may be that subject's time<br />
estimation is affected. Time may seem to pass more quickly than it<br />
actually does.<br />
Upon acclimatization to altitude, individuals took 8% longer (Acute<br />
Altitude Exposure 5.61 set vs Extended Altitude Exposure 6.06 set [means<br />
of rested and fatigue conditions combined]) to sight the target. The<br />
extra time apparently enables increased accuracy of shooting. Increased<br />
respiratory rate is among the physiological adaptations that occur with<br />
acute exposure to altitude, the faster the respiratory rate the more<br />
breaths that are missed during the breath-holding phase of aiming and<br />
pulling the trigger. This may increase discomfort associated with<br />
breath-holding and thereby decrease sighting time.<br />
While shooting at altitude, DCM, a measure of accuracy was 11%<br />
greater after exercise (4.01 cm) than for the rested condition (3.63 cm)<br />
[means of acute and extended altitude exposures combined]. Sighting time<br />
was not affected by fatiguing exercise. In contrast to the present<br />
results, Evans (1966) found accuracy was not affected by fatigue but<br />
firing latency was. Other previous findings proposed increased body sway<br />
after exercise as an explanation for reduced shooting accuracy of<br />
soldiers after a forced march (Knapik, Bahrke, Staab, Reynolds, Vogel and<br />
O'Connor, 1990), and biathletes after cross country skiing (Niinimaa and<br />
McAvoy, 1983). Increases in heart rate resulting from intense aerobic<br />
exercise also may impair shooting proficiency. Heart rate control by<br />
beta-blockers (Kruse, Ladefoged, Nielsen, Paulev, and Sorenson, 1986;<br />
Siitonen, Sonck and Janne, 1977) or biofeedback techniques (Daniels and<br />
Hatfield, 1981) are possible remedies.<br />
If military forces are to be prepared for deployment in a high<br />
terrestrial environment, it may be advantageous to have them training .<br />
routinely at high altitude. These results showed marksmanship accuracy<br />
returned to normal after two weeks residence at altitude. For events<br />
such as the biathlon and shooting competitions, athletes may benefit from<br />
both acclimation to altitude prior to competition and routine training<br />
at altitude.<br />
Daniels, F.S. & Hatfield, B. (1981). Biofeedback. Motor Skills: Theorv<br />
Into Practice 2, 69-72.<br />
Dusek, E.R. & Hansen, J.E. (1969). Biomedical study of military<br />
performance at high terrestrial elevation. Militarv Medicine, 134, 1497-<br />
1507.<br />
Evans, W.O. (1966). Performance on a skilled task after physical work<br />
or in a high altitude environment. Perceptual and Motor Skills, 2, 371-<br />
380.
___ .- _- .~- __--_.--~~ .--. --.- _ --.-- _<br />
Fraser, W.D., Eastman, D.E., Paul, M.A., C Porlier, J.A.G. (1987).<br />
Decrement in postural control during mild hypobaric hypoxia. Aviation,<br />
Space and Environmental Medicine, 58, 768-772.<br />
Fulco, C.S. & Cymerman, A. (1988). Human performance and acute hypoxia.<br />
In Human Performance Phvsiolocy and Environmental Medicine at Terrestrial<br />
Extremes. KB Pandolf, MN Sawka, and RR Gonzalez (editors) Benchmark<br />
Press, INC. Indianapolis, IN, pp. 467-495.<br />
Knapik, J., Bahrke, M., Staab, J., Reynolds, K., Vogel, J., C O'Connor<br />
J. (1990). Frequency of loaded road march training and performance on a<br />
loaded road march. United States Army Research Institute of<br />
Environmental Medicine Technical Report. T13-90, pp. 18-25.<br />
Kruse, P., Ladefoged, J., Nielsen, U., Paulev, P.E., C Sorenson, J.P.<br />
(1986). Beta-blockade used in precision sports: effect on pistol shooting<br />
performance. Journal of Anplied Phvsiolocv, 61, 417-420.<br />
Marlowe, B., Tharion, W., Harman, E., & Rauch, T. (1989). New<br />
computerized method for evaluating marksmanship from Weaponeer<br />
printouts. United States Army Research Institute of Environmental<br />
Medicine Technical Report. T30-90.<br />
Niinimaa, V. & McAvoy, T. (1983). Influence of exercise on body sway<br />
in the standing rifle position. Canadian Journal of Applied SDort<br />
Science, 8, 30-33.<br />
Siitonen, L., Sonck, T., c Janne, J. (1977). Effect of beta-blockade on<br />
performance: use of beta-blockade in bowling and shooting competitions.<br />
Journal of <strong>International</strong> Medical Research, 2, 359-366.<br />
413
__-----,_ __---__.<br />
HUMAN PERFORMANCE DATA FOR COMBAT MODELS<br />
COLLINS, Dennis D., Department of the Army, The Pentagon,<br />
Washington, D. C.<br />
The conceptualization of any modern system requires early<br />
integration with its operational environment. The requirement<br />
for early systems integration is particularly important for<br />
military systems which are unique in that they must function<br />
against an enemy intent on their destruction. Survival in this<br />
environment is frequently the principal mission of the system<br />
and also its principal measure of effectiveness. It is the<br />
analytical merger of the conceptual system with its operational<br />
environment which defines both the objective and importance of<br />
military combat modeling.<br />
Current versions of systems development models are virtually<br />
all computer resident. Because of the complexity of systems<br />
development, plus the requirement for many repetitions modern<br />
combat models are best suited for an automated environment.<br />
Combat models differ from Computer Aided Design/ Computer Aided<br />
Manufacturing (CAD/CAM). CAD/CAM is used to conceptualize and<br />
manufacture a specific system. A systems development combat<br />
model, on the other hand, is used to demonstrate a system's<br />
performance in its anticipated wartime environment performing<br />
against its probable enemy. Combat Models are also unique in<br />
that both the system and the wartime environment are required to<br />
be speculative in order to estimate the probable reality at the<br />
time the system will actually perform its battlefield mission.<br />
Modern data systems provide the capability to view systems<br />
,operational performance early in design, allowing elimination of<br />
candidate concepts even before they leave the drawing board.<br />
This relatively new capability to observe ltdraftVt or ltnotionalt'<br />
systems inside a model of an operational environment presents<br />
not only new powers of design, but new problems as well. The<br />
process of systems development from design through testing now<br />
takes place inside a computer. Entire technology options and<br />
systems design concepts can be eliminated long before even<br />
drawings are completed. Traditional human factors engineering<br />
begins when the concept of a system is sufficiently firm to<br />
permit the design of at least a mock up of the man-machine<br />
interface such as a cockpit simulator. The combat model, however,<br />
has allowed the selection of first order military technologies<br />
and systems candidates completely inside the notional<br />
reality of a computer.<br />
Because the systems development combat model grew from<br />
analytical communities which were oriented to tactics and<br />
engineering, the representation of human performance parameters<br />
in the evolution of combat models was rarely considered. The<br />
impact of this evolution has been subtle. By omitting human<br />
414
factors from both enemy and friendly forces, the engineering<br />
modeler intended to deal with the amorphous area of human<br />
factors through a balanced omission: Since neither side showed<br />
human factors, the effect was balanced and should have had no<br />
effect on the tactical or engineering conclusions drawn from the<br />
model's output. In early tactical wargames and engineering<br />
models, this approach was reasonable because the computers of<br />
the day were functional only in aggregated, t‘low resolutiont8<br />
modeling. Low resolution models provided valuable tactical<br />
insights, but little information about specific systems.<br />
Engineering models were also simple: one tank fired at another<br />
in a straightforward duel format.<br />
As wargames became automated, the ability to conduct<br />
high-resolution simulation allowed the tactical and engineering<br />
'modeling of actual systems in dynamic combat. Automation of<br />
wargames also made omission of human factors both unnecessary<br />
and problematic. "Balanced omission" of human factors in<br />
systems development combat models is more accurately described<br />
as the actual modelinq of the human as 100% effective. By<br />
failing to properly consider the human component of systems<br />
performance, the human has an assumed value of 100% effectiveness.<br />
It is generally accepted, even among combat modelers, that<br />
this assumption has the effect of exaggerating systems performance,<br />
and accelerating the tactical pace of a battle.<br />
The two original clients of the combat model, wargarners and<br />
hardware engineers, have had an understandable lack of interest<br />
in representing the human factor component of systems performance<br />
as anything other than 1.0. Human performance parameters<br />
are still much less defined than hardware performance parameters,<br />
and no clear consensus has emerged as to how human factors<br />
should be modeled. The case for improving the representation of<br />
human factors in systems development combat models focuses on<br />
the impact of modeling humans as 100% effective. Notional<br />
systems over-perform and technologies and systems candidates are<br />
eliminated in an occult process long before their interaction<br />
with the human dimension can be measured.<br />
There are additional dimensions to the dilemma of human<br />
factors in combat models. Combat model proponents have a<br />
somewhat justified view of their critics as romantics who wax<br />
philosophically about the value of such human traits as leadership,<br />
morale and courage on the the battlefield, but cannot<br />
quantify these dimensions in order that they be shown as "independent<br />
variables" in the outcome of analytical combat.<br />
A proposed approach for change is outlined in figure 1. A<br />
first step would be to identify the combat models most often<br />
used in the design and selection of systems. While this step<br />
may appear obvious, there could be a drift toward models which<br />
have little impact on systems development, but are easily<br />
modified for human dimensions. systems development models are<br />
415
usually sophisticated engineering development models which do<br />
not lend themselves to human dimension integration. Subsequent<br />
steps, in turn, would be:<br />
-Select those systems for study which require "man-in-the-loopI'<br />
for optimal functioning. Good candidates for study are those<br />
systems which depend upon humans for the performance of critical<br />
functions. The intent, early in a human dimensions integration<br />
program, is to pick those systems for study which are likely to<br />
show the importance of human dimensions, even when only limited<br />
human performance is modeled.<br />
-Select human systems tasks which are currently modeled by<br />
implication (i.e. man as 1.0) and for which data can be obtained,<br />
such as "acquire target'*,"identify target", or "lock-on<br />
target and fire". When systems are conceived, their designers<br />
allocate some tasks to man, some to the machine and some to both<br />
man and machine. A combat aircraft, for example, might acquire<br />
a target automatically through the system itself, depend on its<br />
operator for correct identification and attack decision, then<br />
return control to the system for attack launch and execution. In<br />
some highly sophisticated design processes using elaborate task<br />
analysis this process is formal. More often it is informal.,<br />
Selection of human-critical tasks will, like the first step,<br />
increase the likelihood that human variance will have an<br />
independent-variable impact on model outcome.<br />
-Modify the selected model to allow replication of the discrete<br />
human functions selected. Actual model algorithms need not be<br />
complex. The initial modifications need only demonstrate that<br />
the human tasks selected do, in fact, influence the outcome of<br />
the analysis as shown by the measures of effectiveness. Modifying<br />
complex models to show the more discrete human functions<br />
such as suppressed action due to fear or diminished target<br />
acquisition due to cognitive overload is within our current<br />
capability. Some models already represent these functions to<br />
a degree.<br />
-Run the model with the human factors modifications using the<br />
best available data.<br />
-Compare the model output (systems exchange ratios, force<br />
exchange ratios, etc.) between the basic combat model and the<br />
human factors modification. At this point human performance<br />
can be observed in a quantified fashion which is both understandable<br />
and acceptable to the senior engineering design<br />
community.<br />
-Demonstrate the value of human.factors algorithms in combat<br />
modeling through the (hopefully) significant differences between<br />
the basic and human factors modified model.<br />
416
Using those systems tasks which are frequently assigned to<br />
humans in systems design (identify friend-or-foe, for example),<br />
develop a plan for the collection of human performance task<br />
data:<br />
First, -search for existing data with high human factors and<br />
engineering community acceptance. In other words, use what we<br />
have first. This approach is particularly important early in<br />
the effort when the needs for combat modeling data are ill<br />
defined. Data collected without a good understanding of how it<br />
will be used is likely to go unused. As the process matures,<br />
the personnel data development and modeling communities will<br />
develop an understanding of one another's needs and a protocol<br />
for data communication will evolve.<br />
Second, develop data through the use of cost effective means<br />
such as developmental tests in training simulators. Since<br />
personnel data formats for combat models are likely to evolve,<br />
the costly process of test or field developed data is likely to<br />
be wasted due to inevitable changes. The new family of flight<br />
and vehicle simulators offers an excellent opportunity to<br />
collect human performance data for combat model input.<br />
Third, develop data through field operations research. An<br />
excellent example of this concept was the Fire Fighting Task<br />
Force study sponsored by the U.S. Army's Concepts Analysis<br />
Agency in Bethesda, Maryland. The Fire Fighting Task Force<br />
studied the psychological impact of stress caused by U. S. Army<br />
Infantry units fighting the Yellowstone National Forest fire in<br />
1988. This type of effort not only generates data for use in<br />
modeling, but contributes to our understanding of combat theory.<br />
Finally,loop early data development back to the human factors<br />
modified model in order to demonstrate human factors as an<br />
independent variable in the outcome of combat and document those<br />
human variables which warrant further developmental research.<br />
This loop-back function will automatically develop personnel<br />
combat modeling data protocol as a by-product.<br />
417
i<br />
_.___ - . . . ..~. -<br />
****************************t************************************<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
ID MODELS USED ‘.<br />
IN SYSTEMS DESIGN/<br />
SYSTEMS DEVELOPMENT<br />
REFINE SELECTION FOR*’<br />
“MAN-IN-THE-LOOP”<br />
SELECT TASKS 3*<br />
CURREN;~Y,~ODELED<br />
MODIFY SELECTED<br />
MODELS<br />
4.<br />
I RUN SELECTED<br />
MODELS AS MODIFIED I<br />
4<br />
DEVELOP DATA 2.<br />
THRU<br />
SIMULATION<br />
DEVELOP DATA 3.<br />
THRU<br />
FIELD COLLECTION I<br />
**t*************************************************************<br />
Figure 1<br />
A Paradigm for the Integration of Human<br />
Factors in Combat Models<br />
418<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
*<br />
-
'L'KADLNC; OE'k' PEKFOKMANCE, TKAlNLNG, AND EQULPMENT<br />
FACTOHS 'I'0 ACHLfiVE SIMLI,AK PEKFOKMANCE<br />
Janet J. Turnaqe, university of Central Florida<br />
Robert S. Kennedy, Essex Corporation<br />
Marshall 8. Jones, Pennsylvania State University<br />
INTRODUCTlON<br />
<strong>Military</strong> systems performance is generally the joint outcome of the human<br />
interacting with the machine. It is convenient to think oE this outcome in<br />
terms of causal models, where the elements that determine or drive systems<br />
performance may be relegated to equipment characteristics, training variables, .<br />
and individual capabilities. We suggest that an appropriate starting place<br />
for such causal analyses is to select a specific level of desired operational<br />
or systems perEormance beforehand. We call this device level Isoperformance.<br />
Then one can employ the different potential predictors of this outcome in an<br />
Isoperformance model. The way the model works is to select each variable as a<br />
potential predictor and then submit it to a trade-off methodology whereby each<br />
variable is compared in light of total operational proficiency desired.<br />
Examples of operational proficiency might be: (1) escape from an aircraft<br />
water crash within 60 seconds, (2) completion of a forced march carrying a<br />
36)-pound pack for 20 miles within 6 hours, (3) an 80% carrier landing board<br />
rate, or (4) control of 20 aircraft in the same airspace simultaneously.<br />
There are three meanings of the term “Isoperformance.” The first is a<br />
conceptual approach to human factoring. Second, the term may describe a<br />
curve, plotted against training time on the abscissa and aptitude on the<br />
ordinate. Third, Isoperformance is a specific interactive computer program.<br />
In this paper, we shall describe each of these features, in turn, and will<br />
present an illustration of one type of application.<br />
But first, let us more Eirmly specify the premise of Isoperformance. The<br />
premise is that the same (Iso) total systems efficiency (performance) is a<br />
function oE trade-ofEs between personnel, training, and equipment. To achieve<br />
this state of affairs, the Isoperformance model is intended to:<br />
(1) Make estimates of training outcomes for different categories of<br />
personnel;<br />
(2) Check internal consistency of estimates:<br />
(3) Compare estimates to known relations from human engineering,<br />
personnel, and training:<br />
(4) Counsel how to change “wrong” estimates:<br />
(5) Output Isoperformance curves; and<br />
(6) Leave a hard- copy audit trai 1.<br />
Isoperformance as a Conceptual Approach to Human Factoring<br />
Isoperformance was inspired by a long history of involvement in human<br />
engineer inq research, development , test, and evaluation. For example,<br />
military speci.fications and standards Eorm the basis for numerous systems<br />
419
-. “-----<br />
requirements, but their number and complexity often makes trade off decisions<br />
difficult because there i.s no context for their Cost. The literature has not<br />
helped either. A U.S. Air Force review of 114 human factors studies from<br />
1958-72 Eound that the physical characteristics of the stimulus were most<br />
often the signif icant factors in performance outcomes and there were few<br />
interactions.<br />
But these studies tended to ignore the contribution of practice<br />
or individual differences. This general finding suggested the multivariate<br />
(holistic) approach that was subsequently employed by the Navy in over a<br />
decade of simulator research, ranging from carrier landing, to air-to-ground<br />
combat , to Vertical Take-Off and Landing (VTOL) studies. The strong inference<br />
suggested by the results of these later studies was that people accounted for<br />
the most variance in performance, followed by training manipulations, and then<br />
by equipment variations.<br />
In this work, what was surprising was the modest amount of performance<br />
variance that could be accounted for by equipment features. In partitioning<br />
the performance variances over numerous experiments, equipment accounted For<br />
15-20%) trials of practice accounted for lo-25%. and people accounted for<br />
50-80%, with error usually in the 25-50% range. Again, interactions were few<br />
and far between. This implied that these main effects could be traded off if<br />
you started with them in the first place! Thus was born Isoperformance. That<br />
is, because some pilots are simply better than others, because repeated hops<br />
are costly, and because costly changes in equipment features may produce<br />
minimal changes in performance, we should concentrate on trade-offs among<br />
these known relations to bring about desired goals rather than only one or<br />
another mechanism. After the relative contributions are determined, a price<br />
tag can be placed on all dimensions and the cheapest solution sought.<br />
Isoperformance is therefore designed to accomp.lish Personnel, Training and<br />
Equipment trade-offs. In the Isoperformance complete program, the term<br />
Personnel can represent such features as sensory capabilities, cognitive and<br />
information processing abilities, anthropometry, or test scores, such as the<br />
Armed Services Vocational Aptitude Battery (ASVAB) scores, (presently the<br />
default condition). The term Training can represent such features as<br />
practice, sequence and series effects, learning, training regima and<br />
schedules, or number of sessions. The default condition is trial of practice.<br />
Similarly, the term Equipment can represent new vs. old, smart vs. dumb.<br />
hi-fidelity vs. low-fidelity, or any general A vs. B conEiguration, which is<br />
the default condition. The data sources for estimates of the scale values for<br />
these variables can come from various origins, including lay opinion, the<br />
scientific literature, explicit experiments, or technical data bases.<br />
Isoperformance Curves<br />
Figure 1 presents an illustration of IsoperEormance using two categories<br />
and one equipment feature (the current technoloqy).<br />
The Personnel category is divided into high- and low- to medium- aptitude<br />
groups, the Training time allotted is 9 weeks, and the proport ion of people<br />
that is desired to complete the training successfully is set at 50%. One can<br />
see that it takes the high-aptitude group only 4 weeks to achieve the SM:~<br />
proficiency level that it takes the low- and mediurr-aptitude group to achieve<br />
in 8 weeks.
PPOPORTI ON OF PEOPLE<br />
04 .T.# -<br />
90<br />
65<br />
50<br />
35<br />
20<br />
5<br />
TRAINING TIYE (in Weeks)<br />
LOW-u!D<br />
Figure 1. Illustration of Isoperformance Using IWo<br />
Categories and One Equipment Feature (The Current Technology)<br />
~- __ ----_<br />
Fiqure 2 shows that, if a second new equipment or technology feature is<br />
introduced (which reduces the training time to reach proficiency), then the<br />
time it takes for the groups to achieve the 50% criterion of proficiency<br />
reduces to 3 weeks and 6 weeks, respectively.<br />
PROPORTION OF PEOPLE<br />
I I I I I I I I<br />
60<br />
65<br />
50<br />
I v I I I I J<br />
2 3 4 5 6 7 6 0<br />
TRAINING TIME (in w*eks)<br />
nt APTITUDE<br />
LOW-UID<br />
Figure 2. If a Second Equipment Feature is<br />
Introduced (Which Reduce the Training Time to Reach Proficiency)<br />
Figure 3 illustrates the relation between category and equipment<br />
differences in terms of Isoperformance curves, where any point on the curve<br />
identifies the same (Iso) performance. Note that the Equipment difference is<br />
smaller than the “Personnel” difference.<br />
EQIJIPXLNT<br />
DIFFERENCES<br />
I CATEGORY<br />
E<br />
DIFFERENCE<br />
N<br />
e<br />
NEW EQUIPMENT OR TEC HNOLOGY<br />
CI y I 7 2 3 4 5 6 7 6 9’<br />
TRAINING 71hiE (in Weakr)<br />
Figure 3. lsoperformance Curves Helatinq Equipment<br />
Differences to Aptitude Differences<br />
421
---_l_ ~..__. --. -- .- -<br />
From these types OE ‘Looper Eormance curves, one can determine feasible<br />
combinations of personnel, training, and equipment. features for any specified<br />
level of des’ired performance. In addition, one can rule 011 t various<br />
combinations if there are constraints, Eor example, on personnel availability<br />
or training time.<br />
Isoperformance as a Computerized Program*<br />
An interactive, expert decision aid has been developed to quantify the<br />
trade-off methodology implicit in the Isoperformance approach. The<br />
computer-based “smart m system is intended to aid in decisionmaking by<br />
mechanizing the trade-offs between human (aptitude, training > and equipment<br />
variations in order to achieve the same (Iso) system performance outcome. The<br />
Isoperformance core subprogram is composed of four phases, Specification, -<br />
Input , Verification, and Output.<br />
Specification, the first phase of the Isoperformance program, requires the<br />
user to state the problem, in effect, by specifying:<br />
(1) the system under study,<br />
(2) what is meant by “proficient performance”,<br />
(3) the aptitude dimension to be used,<br />
(4) how that dimension, is to be divided into ranges or “personnel”<br />
categories, and<br />
(5) the maximum amount of training to be considered.<br />
These specifications are purely descriptive and no relationships have to be<br />
estimated.<br />
Input, the second phase, asks that, for each personnel category, the user<br />
estimate:<br />
(1) the minimum training time necessary Eor people in that category to<br />
become proficient in terms of number of weeks and<br />
(2) the proportion of persons in the category who are expected to become<br />
proficient given the maximum training time.<br />
The program works by receiving input from the user 2-3 estimates as needed<br />
for each aptitude category. The estimates can come Erom any reliable source<br />
(e.g., simulators, extrapolations from related tasks, etc.>. Estimations are<br />
planned for because it is expected that the data required are not readily<br />
available at the present time. However, the input can also be data from<br />
technical data banks iE available.<br />
In the third stage, Verification, the Isoperformance program checks to<br />
make sure that input estimates are “reasonable.” These checks are conducted<br />
whether the input data are “estimates” or actual data Erom a data base or<br />
experiment. There are three types of checks on user estimates: (a) formal,<br />
*Copies oE a demonstration disc are available Erom R. S. Kennedy, Essex<br />
Corporat ion, 1040 Woodcock Road, #227, Orlando, FL 32803
which is a check oE logical necessity, (b) qeneral. which compares est i.mates<br />
with known reqularities, and (c) specific, which compares user input with<br />
library validities. In general, an implicit correlation between aptitude and<br />
the performance dimension on which “proficiency” is defined can be calculated<br />
at every level of training. Also, the implicit correlations should decrease<br />
with training, and the IsoperEormance curves shou Id be decreasing and<br />
negatively accelerated. The results of these checks are repor ted and<br />
explained to the user, together with suggestions as to how the estimate might<br />
be modified to coincide with known regularities and ranges. The fourth phase,<br />
output, is simply the computer output Erom the preceding phases.<br />
AN APPLICATION<br />
The Isoperformance methodology can be applied to numerous human f’actors<br />
areas. Here, we will use as an exemplar freedom Erom simulator sickness in<br />
ground-based flight trainers. Motion sickness is a common problem in the<br />
military, particularly in testing and simulation devices. Virtually everyone<br />
with intact organs of equilibrium is susceptible to one form or another, but<br />
some people get sick all the time while others are virtually immune. However,<br />
we know that practice usually results in adaptation to motion sickness, and<br />
some specific equipment configurations are more conducive to adaptation than<br />
others (e.g., .2Hz).<br />
An example of the approach for applying IsoperEormance to simulator<br />
sickness is as follows:<br />
(1) Obtain a large data base with simulator sickness incidence,<br />
(2) Determine the relationship for each variable,<br />
(3) Isolate variables which are causal,<br />
(4) Select acceptable Isoperformance levels,<br />
(5) Calculate Isoperformance curves using two continuous causal variables<br />
as X/Y and one dichotomous causal variables as comparison,<br />
Afterwards, it is possible to put cost values on the outcomes and determine<br />
trade-offs Erom which decisions can be made.<br />
Therefore, we took from our large data base (N > 1000) of simulator<br />
sickness a series of correlational relationships. We cast them into a<br />
multiple regression equation and obtained the beta weights for such continuous<br />
variables as length of hop, whether visuals are on/off, field of view, usual<br />
state of fitness, etc. using the continuous variables, plus the dichotomous<br />
fit/unfit dimensions, we created Figure 4. Note that a four and one-half hour<br />
hop using a 305-degree field of view for a pilot who was fit would have the<br />
same simulator sickness score (110) as a pilot who had been ill and flew a<br />
two-hour hop with a 195 degree field-of-view.<br />
423
Esthdd Hop laqttl<br />
I Visuals On<br />
5.5 T<br />
Fitness usual<br />
Enough =I-l-m+...1<br />
ccn c,<br />
l--<br />
0.5--<br />
0<br />
I I I I I I<br />
Jo J&n 254 193 1W 83 dl<br />
field of View (Dcgrres)<br />
-Honotboulm “‘Ha hII<br />
Figure 4. Isoperformance Curves Comparing Simulator Field<br />
of View, Hop Length, and Pilots’ Report of Recent Illness<br />
CONCLUSIONS<br />
In Navy flight simulator studies, half of the variance appears to be<br />
attributable to Personnel differences, with Training and Equipment dividing<br />
the rest. Therefore, it is more inFormative to know who is flying than what<br />
trial of practice or on what equipment they are flying. The case EOr<br />
simulator sickness is similar. It appears that a considerable amount of<br />
variance in motion sickness research is attributable to Personnel differences<br />
with smaller proportions attributable to Equipment and Practice.<br />
In general, we believe that lsoperformance goals have merit because they<br />
estimate train-ing outcomes by:<br />
(1) Forcing the user to make estimates of training outcomes for different<br />
personnel categories.<br />
(2) Providing checks on the internal consistency and logical coherence of<br />
these estimates.<br />
(3) Providing checks on how we31 the estimates conform to known<br />
regularities from human engineering, personnel, and training research.<br />
(4) Informing the user as to the results of these checks, together with<br />
information about what can be done to make estimates consistent or<br />
bring them into closer conformity with known regularities and facts.<br />
(5) Leaving a hard-copy audit trail of all estimates, feedback, and<br />
outputted Isoperformance curves.<br />
Implementation of this model can help human engineering practitioners,<br />
training systems designers or human resource managers compare the relative<br />
Costs of differing combinations that lead to the same performance level. This<br />
trade-OCf technology is especially relevant<br />
in military budgets.<br />
today given pro;ected constraints<br />
424
FINAL REPORT, COMPUTER ASSISTED GUIDANCE INFORMATION SYSTEMS<br />
BAYES,Andrew H. Defense Activity for Non-Traditional Education<br />
Support, Pensacola, Fl 32509-7400<br />
INTRODUCTION<br />
In June 1989 DANTES released a final report covering the pilot study of four .,<br />
computer based guidance information delivery systems. In this report, a major<br />
recommendation was that the pilot study be expanded and additional data be<br />
gathered.<br />
The pilot study was expanded to a total of 102 sites in all active duty<br />
Services and to two Air Force Reserve sites. Regional training was conducted<br />
and only those sites that attended training were given the software.<br />
Each site was given User Surveys (Tab A) to be completed by each participant.<br />
The data from these surveys have been summarized in this report.<br />
SOFTWARE<br />
Based upon the results of the pilot study, DISCOVER by American College<br />
<strong>Testing</strong> and GIS by Houghton-Mifflin/Riverside were the software systems used<br />
in this expanded pilot study. 'They were chosen because they were the two<br />
highest rated by both the counselors and the clients. While they represent<br />
two different styles of counseling, they both contained the same basic<br />
modules and information. GIS provides more specific information on<br />
occupations and education. while DISCOVER uses the more traditional counseling<br />
approach.<br />
DATA COLLECTION<br />
The User Surveys requested demographic data as well as data reflecting the<br />
reactions of the clients to the software. It is interesting to note the data<br />
trends when pay grade or education is compared to reactions.<br />
STATEMENT OF PURPOSE<br />
* To determine if these systems were meeting expressed needs of the client<br />
population.<br />
* TO determine if these systems were a valuable addition to the resources<br />
available from the education centers<br />
* To determine what if any additional data bases would be valuable.<br />
* To determine if the systems were cost effective.<br />
* To determine if counselor time better utilized as a result of clients having<br />
used CAGIS.
DATA ANALYSIS AND INTERPRETATION<br />
While over 800 User Surveys were analyzed, the numbers do not remain constant<br />
because in some cases, the pay grade was not available, not all questions were<br />
answered by all respondents, or directions for completing the forms were not<br />
followed. It is felt, however that enough data was collected that the results<br />
are valid and do represent the cross section of clients visiting the education<br />
centers. It would be risky, however, to extrapolate from this data to the<br />
entire <strong>Military</strong> population. In a sense, the data here, represents only the<br />
reactions of individuals visiting the education centers and this may be a<br />
special sub-population. No attempt has been made to compare this population<br />
with the <strong>Military</strong> in general.<br />
P El<br />
A E2<br />
Y E3<br />
E4<br />
G E5<br />
R E6<br />
A E7<br />
D E8<br />
E E9<br />
TABLE I<br />
EDUCATIONAL LEVEL BY PAY GRADE<br />
EDUCATIONAL LEVEL<br />
1 2 3 4 5 6<br />
.Ol .60 .30 .Ol 0 0<br />
0 .77 .63 0 0 0<br />
.Ol .51 .29 .04 .04 .lO<br />
0 .47 .3a .06 .04 .02<br />
0 .32 .30 .lO .09 .08<br />
0 .33 .33 .16 .09 .03<br />
.04 .13 .33 .15 .ll .15<br />
0 .17 .22 .17 0 .33<br />
0 .19 0 .31 .25 .06<br />
_------ 7 8<br />
0 0<br />
0 0<br />
0 0<br />
C.01 C.01<br />
.02 .Ol<br />
.05 0<br />
0 .06<br />
0 .I1<br />
0 .19<br />
C<br />
--.-.A<br />
0<br />
0<br />
0<br />
G<br />
0<br />
0<br />
-02<br />
0<br />
0<br />
01 .25 0 .50 .12 .12 0<br />
02 . 78 .ll .ll 0<br />
03 . 62 .15 .23 0<br />
04 .09 .18 .72 0<br />
05 . 11 .22 .22 -44 0<br />
EDUCATIONAL LEVEL 1 = No diploma<br />
EDUCATIONAL LEVEL 2 = High School/GED<br />
EDUCATIONAL LEVEL 3 = l-2 Years of College<br />
EDUCATIONAL LEVEL 4 = AA/AS Degree<br />
EDUCATIONAL LEVEL 5 = 3-4 Years of college<br />
EDUCATIONAL LEVEL 6 = BA/BS Degree<br />
EDUCATIONAL LEVEL 7 = Some graduate study<br />
EDUCATIONAL LEVEL 8 = Masters degree<br />
EDUCATIONAL LEVEL 9 = Doctorate<br />
CAREER PLANS<br />
When asked to describe their career plans, the enlisted population at the<br />
El-E4 level indicated that they planned to leave after their current<br />
enlistment. E5s were almost evenly divided between remaining on active dut)<br />
426
___-. ----- ..~._ ~. ----~--..---.- .____<br />
until retirement and being uncertain about leaving after their present<br />
enlistment. E6-E8 indicated that they planed to stay until retirement. The<br />
following table indicates career plans by branch of Service and pay grade.<br />
427
TABLE II<br />
CAREERPLANS<br />
1 2 3 4 5<br />
P<br />
A El .08 .08 .25 .50 -08<br />
Y E2 . 11 .04 .21 .43 . 21<br />
E3 . 11 .51 . 16 .43 .24<br />
G E4 . 12 .05 * 10 .42 * 31<br />
R E5 .33 .05 . 18 . 30 . 13<br />
A E6 .70 .02 -08 .16 .03<br />
D E8 .72 . 11 .08 .05 .03<br />
E E9 .96 0 0 0 .03<br />
% % % % %<br />
ARMY .20 .05 .09 .38 .27<br />
AIR FORCE .41 .04 .12 .31 .ll<br />
NAVY .29 .04 -18 ‘29 .20<br />
MARINES .40 .03 .08 . 38 . 11<br />
CAREER PLANS:<br />
l=Probably stay until retirement<br />
2=Stay beyond present obligation but not to retirement<br />
3=Probably stay beyond present obligation but not until retirement<br />
4=Probably leave after present obligation<br />
5=Definitely leave after present obligation<br />
While the majority of clients learned about CAGIS when they visited thr:<br />
Education Center or attended a briefing by the Education Center staff, about<br />
l/3 learned about CAGIS from a, co-worker. This would indicate that the<br />
program was felt to be valuable enough to recommend it to a friend.<br />
Because part of the data collection was designed to determine the relati,Je<br />
effectiveness of the two systems, several comparisons were made. Eighty-cr,e<br />
percent of the sites received GIS but only 45% of the User Surveys were<br />
returned from GIS sites. Clients spent more time using Discover (29% 46 to 60<br />
minutes) as opposed to GIS users who spent less time with that system (34%-16<br />
to 30 minutes). Seventy-one percent of the clients spent between sixteen and<br />
sixty minutes using the software. It is interesting to note that E3s and E6s<br />
are spending the most time using the computer. This appears to be a critical<br />
time for them in their careers. Eighty-one percent of the GIS users felt they<br />
understood the system they used, while 91 percent of the Discover users felt<br />
they understood that system.<br />
EFFECTIVENESS OF SYSTEMS<br />
In an attempt to determine the effectiveness of the two systems, clients were<br />
asked to compare the computer systems with other types of reference materials.<br />
The first question asked the client to rate the CAGIS information in relatrcn<br />
to any other reference source. Eighty-two percent of the users rated CAGIS<br />
either superior or better than any other sources. Clients were then asker:<br />
about the currency of the information, and again 77% rated the inform#a%icn<br />
either superior or decidedly more current.
As a further measure of the effectiveness of the systems, the clients were<br />
asked if they talked with a counselor following their session on the computer,<br />
and if they did talk with a counselor did they feel better prepared. Forty<br />
percent of the users did not talk with a counselor after interacting with the<br />
software. Ninety-one percent of those that talked with a counselor stated<br />
that they were better prepared to talk with a counselor. Seventy-five percent<br />
of those that did not talk with a counselor felt that they did not need to<br />
talk with a counselor because the system had answered all their questions.<br />
These statements would indicate that the systems are maximizing the counselor<br />
resources by screening out those clients that were basically seeking only<br />
information. This frees the counselors to do counseling and relieves them of<br />
simple information giving.<br />
RANKING THE DATA BASES<br />
When asked to rank which data base they felt was most useful, the percentage<br />
of clients ranked the following data bases as number one:<br />
Civilian Careers .35<br />
Undergraduate Degrees .25<br />
Graduate Degrees .13<br />
<strong>Military</strong>/Civilian Crosswalk .ll<br />
Financial Aid -07<br />
<strong>Military</strong> Careers .06<br />
Resume .02<br />
OVERALL RATING OF CAGIS<br />
As a- feature in the Services provided by the education offices, users were<br />
asked to rate the CAGIS they used. For the two systems, the following ratings<br />
were .assigned:<br />
TABLE III<br />
RATING<br />
1 2 3 4 5<br />
GIS .56 .37 .06 c.01 c.01<br />
DISCOVER .62 .34 .04 c.01 x.01<br />
l=Essential 2=Important 3=Neutral I=Not important S=Not required<br />
CONCLUSIONS<br />
In reviewing the Statement of Purpose, the data supports each statement.<br />
The systems are meeting the needs of the clients as evidenced by comparing the<br />
responses of the clients as to which data bases they used and how they<br />
ultimately ranked those data bases. This is also shown by the responses to<br />
the completeness and currency of the information provided.<br />
429<br />
---_
Clearly these systems are felt to be valuable resources. Over 90% of the<br />
users felt the systems were either essential or very important additions to<br />
the education centers.<br />
cost effectiveness is difficult to determine, but when one notes the amount of<br />
time spent using the software and compares this to the hourly cost of a GS<br />
9/11 guidance counselor it is apparent that money is being saved. with<br />
increased quantities, the price of the leases becomes even less. Site<br />
licenses also provide more software at an even greater saving. The fact that<br />
family members and DOD civilians can also access the systems at no additionai<br />
cost further enhances the cost effectiveness.<br />
.<br />
Better utilization of the counselor time is apparent by the data indicating<br />
the number of users that did not need to meet with a counselor upon.completion<br />
of their use of the software. The number of clients indicating that they were<br />
better prepared to meet with a counselor also allows the counselor to provide<br />
assistance with things other than simple information giving. (e.g. information:<br />
integration)<br />
RECOMMENDATIONS<br />
1. DANTES seriously investigate the possibility of adding some of the<br />
requested additional dat,a bases. Working with the vendors to incorporate<br />
these data bases into the existing systems should be relatively easy. The<br />
vendors have been asked to identify SOC schools in their next editions.<br />
2. As the personnel resources of the education centers are being drawn down<br />
and more Service members are being released from the Service, these systems<br />
should be expanded to reduce the quantity of personal counseling. Education<br />
centers should consider increasing their investment in computer hardware in<br />
order to expand their counseling efforts. When on-site scoring becomes<br />
possible, sites will want the capability to take advantage of this enhancement.<br />
DANTE.9 plans to expand the program to approximately 250 sites, but only<br />
to those sites that are willing to participate in training and have the<br />
hardware available. Several Educational Services Officers stated that the<br />
addition of CAGIS was extremely valuable in augmenting their resources for<br />
Project Transition.<br />
3. Increasing the "user friendliness" should be a major objective of the<br />
vendors. While the information in the systems is valuable, difficulty in<br />
accessing the information diminishes the usefulness of the systems. The<br />
vendors need to be aware of this shortcomming and either provide additional<br />
training or provide more technical support to the education centers.<br />
4. Counselors need to overcome their reluctance to use the computers. Their<br />
resistance to become involved with computers is denying wider use of the<br />
systems. The counselors do not really know how much information each system<br />
has and inconsequence do not take full advantage of the breadth of information<br />
available. An effort should be made to work with the counselors during workshops<br />
and national conventions.<br />
430
5. Training is essential to the success of the program. It is recommended<br />
that someone from DANTES attend each of the training Sessions. This was not<br />
done for this portion of the pilot study and inconsequence, data collection<br />
was slow and many follow-up letters had to be written.<br />
6. The need for extensive and current data is apparent. Many outdated and/or<br />
hardbound references can be replaced by the CAGIS software. DANTES should<br />
consider distributing reference materials less frequently and rely more on the<br />
information available in the CAGIS software.<br />
SUMMARY<br />
The data from this expanded pilot study clearly substantiates the data from ..<br />
the initial pilot study. The systems have considerable value not only to the<br />
clients but also the education center personnel. It is cost effective, up-todate,<br />
thorough, and most importantly readily available. The users, ranging<br />
from active duty personnel to DOD civilians and family members indicated very<br />
strongly that this is an essential service.<br />
As the <strong>Military</strong> enters into a period of austere funding and personnel<br />
reductions, programs such as CAGIS will become increasingly important to help<br />
personnel make the transition back to the civilian workplace and higher<br />
education. Comments from program administrators clearly demonstrate the<br />
feeling that these programs are going to fill a very large gap in their<br />
services.<br />
Education centers need to move more rapidly into the world of automation and<br />
take advantage of the information explosion. These systems are going to make<br />
information retrieval instantaneous and eliminate hours of tedious research<br />
using hard cover reference materials.<br />
It would appear from the current data that these systems have value for all<br />
pay grades and all branches of Service. The program should be expanded to<br />
allow all sites that have a need to be able to access one of the systems.<br />
431
VERTICAL COHESIOk PATTERNS IN LIGHT INFANTRY UNITS'<br />
Cathie E. Alderks<br />
U.S. Army Research Institute for the<br />
Behavioral and Social Sciences<br />
Alexandria, VA<br />
Researchers have shown that strong cohesion among soldiers<br />
as well as cohesion within platoon level leadership teams has a<br />
consistent association with platoon performance and the ability<br />
to withstand stress (Siebold and Kelly, 1988a, 1988b). However,<br />
research pertaining to the impact of vertical cohesion up and<br />
down the chain of command on small unit performance is limited.<br />
In this paper the pattern of vertical cohesion from squad through<br />
company and its impact on performance at Army Combat Training<br />
Centers are examined.<br />
METHOD AND SAMPLE<br />
Data were collected by questionnaire from soldiers and<br />
leaders within five light infantry battalions (N = 60 platoons)<br />
at three points in time. The first point in time (Base) occurred<br />
4-6 months before the battalion was scheduled to go through a<br />
training rotation at either the U.S. Army National Training<br />
Center (NTC), Fort Irwin, CA, or the U.S. Army Joint Readiness<br />
Training Center (JRTC), Fort Chaffee, AR. The second point in<br />
time (Pre-rotation) was 2-4 weeks prior to the rotation; the<br />
third point (Post-rotation) occurred 2-4 weeks following the<br />
training rotation.<br />
Base and pre-rotation questionnaires were administered by<br />
researchers from the U.S. Army Research Institute to platoon<br />
level soldiers (squad members (SM), squad leaders (SL), platoon<br />
sergeants (PS), and platoon leaders (PL)) one company at a time<br />
in either a classroom or dayroom setting. Soldiers took<br />
approximately 30 minutes to complete the 160-item questionnaire<br />
after instructions. Soldiers responded on a machine readable<br />
answer sheet. Post-rotation questionnaires were given at the<br />
start of interviews in an office or dayroom setting to the<br />
following groups of soldiers within a company: 1) all PLs, 2) all<br />
PSS, 3) two-thirds of the SLs, and 4) all SMs from one intact<br />
squad in the company. Post-rotation questionnaires were short<br />
(21 items plus some unit and position identification questions}<br />
and took soldiers less than 10 minutes to complete: responses<br />
were made on the questionnaire itself.<br />
'The views expressed in this paper are those of the author<br />
and do not necessarily reflect the views of the U.S. Army<br />
Research Institute or the Department of the Army.<br />
432
Post Selection Board Analysis<br />
Post-selection board review of the 1986/87 NROTC scholarship<br />
year pointed to the need to build more structure into the<br />
evaluation system in order to (1) provide more consistency in<br />
the evaluation of records and (2) permit the selection of those<br />
who were truly best qualified in both an academic and potential<br />
officer sense.<br />
Assessment of the criteria used by board members to assign<br />
points to an application suggested that there was wide variance<br />
among board members in the value placed on the level of a<br />
student's academic or extracurricular performance and the type<br />
of student extracurricular activity. For example, some board<br />
members felt that athletic participation was essential for<br />
success as an officer; others did not. Applications were<br />
scored accordingly, with the resulting selection scores<br />
dependent upon the values of the particular selection board<br />
members assigned to review an application. This created the<br />
potential for wide variance in the scoring of similar<br />
applications by different selection boards.<br />
Analysis of the scores assigned by the weekly boards revealed<br />
that the average score awarded was over 80 points (out of<br />
100). This meant that weekly selection board members had very<br />
little ability to "reach down" to select an applicant who came<br />
to the selection process with a less competitive Quality Index,<br />
regardless of the merit of the applicant.<br />
Solution<br />
To address the problems of evaluation consistency and the<br />
extremely high average selection board score, a more formal<br />
method of application evaluation was instituted. Applicant<br />
evaluation categories were developed from observation of board<br />
member discussion during the initial weekly selection board<br />
sessions. Those areas that selection board members appeared to<br />
value consistently as most important when discriminating<br />
between competitive scholarship applicants were incorporated<br />
into a revised applicant evaluation system. Each evaluation<br />
category was also assigned a scoring level maximum . Optical<br />
Mark Reading (OMR) equipment was purchased and the NROTC<br />
Scholarship application was redesigned to be read by an optical<br />
scanner. Additionally, a formal selection board training<br />
program was developed to ensure that each weekly selection<br />
board began the selection board process with the same<br />
application evaluation guidance.<br />
This revised selection system was finalized during the summer<br />
of 1987 and used by the first weekly selection board of the<br />
1987/1988 NROTC program year. Each year, data based on<br />
selection board actions are reviewed and the system modified as<br />
483
The base and pre-rotation questionnaires contained items<br />
which formed scales measuring interpersonal, organizational, and<br />
leadership constructs (e.g., SM horizontal cohesion, job<br />
satisfaction, command climate, training effectiveness), as well<br />
as various demographic items. The post-rotation questionnaires<br />
focused mainly on soldier perceptions of performance during their<br />
recent rotation. In addition, for the two battalions which<br />
rotated through the JRTC, ratings on leader and platoon<br />
performance were provided just after the rotation by the<br />
observer/controllers (0~s) who observed each platoon during the<br />
rotation. In other words, the base and pre-rotation<br />
questionnaires contained the home station determinants<br />
(predictors) of performance; the post-questionnaires and the<br />
ratings from the OCs provided criterion measures for that<br />
performance.<br />
For the present paper, only Pre-rotation interpersonal<br />
scales and Post-rotation performance scores were considered.<br />
Platoon scores were obtained for each of the scales using a mean<br />
aggregate procedure. Standard scores were obtained to compare<br />
scores on the same scale.<br />
Vertical cohesion scales were used to examine the strength<br />
of each segment in each SM to Company Commander (CC) chain of<br />
command. These scales included 1) SM rating SL, 2) SM rating PS,<br />
3) SM rating PL, 4) SL rating PS, 5) SL rating PL, 6) PS rating<br />
PL, 7) PS rating CC, and 8) PL rating CC. It must be emphasized<br />
that in each case, a subordinate was rating a superior. These<br />
scales were composed of items such as "the leader treats us<br />
fairly", "the leader looks out for the welfare of his people",<br />
"the leader is friendly and approachable", "the leader pulls his<br />
share of the load in the field", and "the leader would have my<br />
confidence if we were in combat together". Scale item factor<br />
loadings (where N was sufficiently large to justify a factor<br />
analysis, i.e., sets of scale ratings by SMs and SLs) were .80-<br />
. 87, .79-.8G, .80-. 86 for SMs rating SLs, PS, and PL,<br />
respectively, and .65-.88 and .72-.90 for SLs rating PSs and PLs,<br />
respectively, with each scale forming independent factors.<br />
Performance scales were obtained from ratings of missions<br />
performed at JRTC/NTC. They were determined four ways: 1) oc<br />
ratings, 2) CC ratings, 3) Platoon ratings composed of the mean<br />
ratings of the PL, PS, SLs, SMs with each level receiving a<br />
weight of one, and 4) Overall ratings composed of the mean<br />
ratings of the OCs, CC: PL, Ps, SLs, SMs with each level<br />
receiving a weight of one.<br />
Two approaches were chosen to examine vertical cohesion in<br />
the chain of command. The first approach was to identify the<br />
lowest break. The rationale was that since the lower leaders<br />
oversee the squad members who accomplish the direct fighting<br />
tasks, lower breaks in the chain of command might have a greater<br />
impact on direct platoon performance than breaks that occurred<br />
433
--<br />
higher. The second approach was to count the total number of<br />
breaks that occurred anywhere in the chain of command. In both<br />
approaches, z-scores were computed to determine if and where<br />
breaks occurred. The decision rule for a break to occur required<br />
a z-score I -.5 on the scale measuring cohesion between one<br />
position in the chain of command and a higher position. Where<br />
two or more scores for rating a particular leader were available<br />
(e.g., SMs, SLs, and PSs each rating PL) only one of the scores<br />
.was required to meet the decision rule of z 5 -.5. By example, a<br />
platoon could have a z ( -.5 at the SM-SL level and also at the<br />
PL-CC level. It would be included in the SL lowest break group<br />
and not considered for further lowest break groups. However, in<br />
composing the number of break groups, this platoon would be<br />
counted as having two breaks.<br />
RESULTS AND DISCUSSION<br />
The lowest break in the chain of command could occur at any<br />
point. Table 1 shows where the lowest level break occurs and<br />
lists the number of platoons per battalion in each cn the<br />
categories.<br />
Table 1.<br />
Frequency at Which the : Frequency Distribution of Lowest<br />
Lowest Break Occurred : Break by Battalion<br />
.<br />
Level of Platoon . Battalion Level of Lowest Break<br />
Lowest Break Freq. % : SL PS PL CC NONE<br />
. ---<br />
:<br />
SL 20 33 : V 3 0 3 3 3<br />
.<br />
PS 16 27 :<br />
.<br />
W 4 5 2 0 1<br />
PL 10 17 :<br />
.<br />
X 3 2 1 1 5<br />
cc 4 7 :<br />
.<br />
Y 6 4 10 1<br />
NONE 10 17 : z 4 5 3 0 0<br />
Table 2 gives similar information for the analysis approach<br />
considering the total number of breaks within each platoon<br />
focused chain of command. As there were four levels within each<br />
chain, a range of zero to four breaks was possible.<br />
Correlations indicating the relationship between the lowest<br />
break and the number of breaks in the vertical cohesion chain of<br />
command with the performance scales are listed in Table 3.
Table 2.<br />
Frequency of Total Number : Frequency Distribution for the<br />
of Breaks per Platoon : Number of Breaks by Battalion<br />
. - - . _.-..-__.<br />
Total Number Platoon :<br />
of Breaks Frey. % : Battalion<br />
.<br />
*-<br />
.<br />
Number of Breaks<br />
0 1 2 3 4<br />
------- - _ _ .__ _,. _<br />
0 10 17 :<br />
.<br />
v 3 3 6 0 0<br />
1 17 29 : w 1 3 6 2 0<br />
2 21 34 : X 5 3 2 2 0<br />
:<br />
3 9 15 : Y 1 6 2 2 1<br />
:<br />
4 3 5 : z 0 2 5 3 2<br />
Table 3. Lowest Break and Number of Breaks Correlated with<br />
Performance Measures<br />
Type of<br />
Measure Platoon Performance Rated By:<br />
oc cc PLT<br />
--<br />
OVERALL<br />
LOWEST BREAK .37 . 03 .33** .zG"-k<br />
--_-. _<br />
---.<br />
NUMBER<br />
OF BREAKS -*Zig* -.34* -.37** -.44***<br />
~-_- __----_<br />
* p < .05 ** p < .Ol *** pc -001<br />
Figures 1 and 2 illustrate the relationship between the mean<br />
overall performance scores and the lowest break and number of<br />
breaks conditions, respectively. Analysis of variance provides<br />
an F of 3.74, p < . 01 and an F of 4.47, p < .004 for the data in<br />
Figures 1 and 2, respectively. Similar results were obtained by<br />
using any of the other methods of computing the performance<br />
measures.<br />
Examination of Figure 1 reveals that platoon performance is<br />
most degraded when either the PS or the PL is at the position of<br />
the lowest break. Performance is better than average when the<br />
lowest break in vertical cohesion occurs at the CC level and is<br />
best when the vertical chain has no breaks at all. Since<br />
performance measurement was at the platoon level, some cloudinrj<br />
435
Figure 1. Lowest Break in Vertical<br />
Cohesion by Mean Platoon Performance<br />
0.75<br />
0.5<br />
0.25<br />
0<br />
-0.25<br />
4.5<br />
A-<br />
I -,---L-i<br />
SL PS PL CC NONE<br />
.WTY POSITION OF LOWEST BREAK<br />
Figure.2. Number of Breaks in Vertical<br />
Cohesion by Mean Platoon Performance<br />
0.75<br />
0.5<br />
0.25<br />
iz 0<br />
4<br />
3 -0.25<br />
0<br />
t -0.5<br />
i -0.75<br />
Y -1<br />
.-e.. .._. / ,---.--_<br />
..-i .,-. _-,- --.<br />
& _______---_d<br />
4 3 2 1 0<br />
NUMBER OF BREAKS<br />
436
--<br />
of the results occurred at the SL level of breaks. Seldom would<br />
one find all SM-SL links within a platoon equally rated.<br />
Therefore, taking an average SM-SL cohesion rating for the three<br />
squads within a platoon moderated particularly strong or weak<br />
links. A break at the SM-SL level would meet the z 5 -.5<br />
criterion only if one or more of the SM-SL links were extremely<br />
weak. Nevertheless, good links in other squads could compensate<br />
and result in the platoon having acceptable performance. This,<br />
and other explanations are being studied.<br />
Examination of Figure 2 reveals additional findings.<br />
Generally, the fewer cohesion breaks there are, the better the<br />
performance with performance being best when there are no<br />
cohesion breaks at all. Performance is maintained at an average<br />
level with one or two breaks. Additional breaks in cohesion<br />
correspond to less than average performance.<br />
In summary, while a causal relationship can not be inferred,<br />
it appears that the strength of vertical cohesion as measured<br />
prior to engagement is a good predictor of platoon performance at<br />
a Combat Training Center. Vertical cohesion appears most<br />
important to platoon performance at the top platoon leadership<br />
levels, that of PS and PL. Where cohesion breaks at this level,<br />
performance tends to be less effective. However, when vertical<br />
cohesion is strong (that is, when subordinates see their<br />
superiors at taking care of them and being skilled), performance<br />
is strong. These findings are important because they<br />
quantitatively confirm "common lore"; they suggest the cohesive<br />
strength of a chain can be measured, and they indicate that the<br />
success of any efforts to increase or maintain the strength of<br />
vertical cohesion in a platoon focused chain of command can be<br />
assessed against a clear criterion measure.<br />
REFERENCES<br />
Siebold G.L. and Kelly, D.R. (1988a) The impact of cohesion on<br />
platoon performance at the Joint Readiness Training Center.<br />
Technical Report 812. Alexandria, VA: U.S. Army Research<br />
Institute for the Behavioral and Social Sciences. ADA<br />
202926.<br />
Siebold, G.L; and Kelly, D.R. (1988b) A measure of Cohesion which<br />
predicts unit performance and ability to withstand stress.<br />
Proceedinss: Sixth Users' Workshop on Combat Stress, San<br />
Antonio, TX, 30 Nov-4 Dee 1987. Consultation Report 88-003.<br />
Fort Sam Houston, TX: Health Care studies and Clinical<br />
Investigation Activity, Health Services Command.<br />
437
THE USE OF INCENTIVES IN LIGHT INFANTRY UNITS'<br />
Twila J. Lindsay and Guy L. Siebold<br />
U.S. Army Research Institute for the<br />
Behavioral and Social Sciences<br />
The research described in this paper is part of a larger<br />
project to examine the home station determinants of subsequent<br />
small unit performance at U.S. Army Combat Training Centers.<br />
This paper focuses on describing the patterns of utilization of<br />
standard incentives in units and the extent to which these<br />
patterns were associated with other organizational variables ,and<br />
small unit performance. The incentives examined were llPublic<br />
recognition for a job well done", "Passes", "Awards",<br />
I'Specialized training coursesIt, "Letters of appreciation or<br />
commendationl', and ttPromotions.lV<br />
METHOD AND SAMPLE<br />
Data were collected by questionnaire from soldiers within<br />
five light infantry battalions (N = 60 platoons) at three points<br />
in time. The first point in time (base) was 4-6 months before<br />
each battalion was scheduled to go through a training rotation at<br />
either the U.S. Army National Training Center (NTC), Fort Irwin,<br />
CA or the U.S. Army Joint Readiness Training Center (JRTC), Fort<br />
Chaffee, AR. The second point in time (pre-rotation) was 2-4<br />
weeks before the rotation; the third point (post-rotation) was<br />
about 2-4 weeks after the training rotation. There were two<br />
other sources of data : a) platoon mission performance ratings at<br />
JRTC by the platoon level observer/controllers (O/Cs) on 23 of<br />
the. platoons, and b) company commanders' ratings of the mission<br />
performance of their subordinate combat platoons at NTC/JRTC.<br />
Base and pre-rotation questionnaires were given typically to<br />
all soldiers (squad members through platoon leader) in one<br />
company at one time in a classroom or dayroom setting. The<br />
soldiers responded on machine-readable answer sheets. The<br />
questionnaires consisted of about 160 items and took the average<br />
soldier about 30 minutes to complete after instructions. Postrotation<br />
questionnaires were short (21 items plus some unit and<br />
position identification questions) and took soldiers less than 10<br />
minutes to complete; responses were made on the questionnaire<br />
itself. Post-rotation questionnaires were given at the start of<br />
group interviews to four separate groups of soldiers in a company<br />
(platoon leaders, platoon sergeants, squad leaders, and members<br />
of one intact squad). Post-rotation questionnaires, along with<br />
the subsequent group interviews, were usually given in an office<br />
or dayroom setting.<br />
The base and pre-rotation questionnaires contained items or!<br />
'The views expressed in this paper are those of the aUthOrS i-zne<br />
do not necessarily reflect the views of the L:,S. Arm'; Research<br />
Institute or the Department of the Army<br />
--. .- - . - __
incentive utilization, scales measuring important interpersonal<br />
and organizational constructs, and various demographic items.<br />
The post-rotation questionnaires focused on soldier perceptions<br />
(self ratings) of mission performance during their recent<br />
rotation. In other words, the base and pre-rotation<br />
questionnaires contained the home station determinants<br />
(predictors) of performance, including utilization of incentives;<br />
the post-rotation questionnaires (platoon self ratings) and<br />
ratings by the O/Cs and company commanders functioned as<br />
criterion measures of that performance.<br />
The analyses prepared for this paper focused on the<br />
responses from the squad members to the pre-rotation<br />
questionnaire which included a measure of incentive use. The<br />
soldiers assessed the utilization of each incentive; an<br />
aggregation of responses to the items was used to assess the<br />
total level of incentive utilization. The use of each incentive<br />
was assessed by a five point scale: 1 = seldom used, 2 = used<br />
occasionally, sometimes for the wrong people, 3 = used<br />
occasionally, given to the right people, 4 = used often,<br />
.sometimes given to the wrong people, 5 = used often, given to the<br />
right people. A two dimensional response scale was used due to<br />
shortage of questionnaire space.<br />
RESULTS<br />
The distribution of overall individual squad member<br />
responses assessing the utilization of each incentive is<br />
illustrated in Figure 1. The figure indicates that giving<br />
'lPasseslt was the incentive most frequently utilized and the<br />
incentive most often given to the right soldier. The least<br />
utilized incentive was "Letters of appreciation or commendation."<br />
The incentive seen as most often given to the wrong person was<br />
*8Promotions.n<br />
This incentive utilization pattern was similar across the<br />
five battalions and for most companies. Most variation in the<br />
utilization patterns was across platoons. This finding may<br />
indicate that there was an attitudinal component to the ratings<br />
which may have biased their accuracy. Nonetheless, the overall<br />
responses of the soldiers, as well as the platoon mean<br />
utilization levels shown in Table 1, suggest that, on the whole,<br />
incentives are not as frequently or effectively utilized as they<br />
might be'.<br />
A key focus of analysis in this research was to estimate the<br />
relationships between use of incentives, standard organizational<br />
variables, and platoon performance. The estimates of these<br />
relationships were needed to develop a working model of the<br />
interactions among the variables. Such a working model, in turn,<br />
was needed to develop a more thorough model for use in designing<br />
programs, tools, or interventions to enhance unit performance.<br />
In the analysis for this paper, the authors examined a set<br />
of standard organizational variables to find their relation to<br />
the use of incentives: 1) company learning climate, 2) job<br />
satisfaction, 3) platoon pride, 4) expectations that the NTC/JRT.C<br />
rotation would be valuable training, 5) motivation for the<br />
3 39
%5 (+<br />
40-<br />
‘-J<br />
3 o- Lt)<br />
2 o-<br />
1 3-<br />
‘)<br />
2 5<br />
17<br />
llll-n10<br />
1 2 3.4 5<br />
PUBLIC RECOGNITION<br />
1<br />
25<br />
2 3 -<br />
-<br />
17<br />
FIGURE 1. UTILIZATION OF INCENTIVES BY SCXJAD MEMBERS<br />
7<br />
%tj o-<br />
40-<br />
! O-<br />
30-<br />
20-<br />
1 o-<br />
x5 o<br />
4 0<br />
3 0<br />
20<br />
10<br />
0I<br />
27-<br />
1<br />
33-<br />
1<br />
1<br />
--<br />
KEY<br />
1 =seldom used<br />
2=occasionally used, wrong person<br />
3=occasionally used. right person<br />
4=used often, wrong person<br />
5=used often, right person<br />
27- -<br />
28<br />
16- 16-<br />
14<br />
I<br />
5 II<br />
25<br />
23<br />
2 3<br />
17<br />
l-n<br />
8<br />
5<br />
4<br />
PASSES<br />
AWARDS<br />
18<br />
11<br />
-TRAINING COURSE LETTER OF APPRECIATION PROMOTION
----------~-_-~._-.-- ..-- .- . ..-..._-.-__ _------_-<br />
-.. ..- ~_<br />
Table 1. Overall Platoon Means and Standard Deviations for<br />
Utilization of Incentives (N = 60 platoons)<br />
Incentive<br />
Public recognition for a job<br />
well done (1)<br />
Passes (2)<br />
Awards (3)<br />
Specialized training courses (4) 2.5<br />
Letters of appreciation or<br />
commendation (5)<br />
Mean ___ SD<br />
2.6 . 51<br />
2.8 . 54<br />
2.5 .48<br />
.52<br />
2.4 � 52<br />
Promotions (6) 2.6 .50<br />
Incentives - aggregated (7) 2.6 .42<br />
Table 2. Correlations Between Incentive Items and Organizational<br />
Variables and Performance Criteria<br />
Organizational Variables<br />
& Performance Criteria<br />
Learning Climate<br />
Incentives (see Table 1)<br />
t11 (21 (3) (41 (51 (61 (7)<br />
.60 .53 .56 .52 -38 .56 .74<br />
Job Satisfaction .61 .43 .62 .53 .49 .68 .71<br />
Platoon Pride . 48 .54 .62 .40 .44 .46 .67<br />
NTC/JRTC Expectations .61 .43 .46 -40 -54 .55 .61<br />
O/C Criterion Ratings . 19 .32 .45 -27 -18 .42 .39<br />
Company Commander Ratings -.05 .06 .29 -22 -.08 -.03 .09<br />
Platoon Self Ratings I 09 .23 .25 -13 .005 .23 .19<br />
Note: N = 60 platoons for correlations in first four rows; all<br />
correlations = pc.01. For O/C Ratings, N = 23 platoons: r values<br />
of .32 or higher = p
otation, and 6) general job motivation. These variables were<br />
selected because it was felt that incentive utilization would<br />
affect or be affected by these variables and that the latter<br />
should directly impact upon unit performance. In particular, it<br />
was anticipated that the use of incentives would relate to both<br />
event (NTC/JRTC) motivation and qeneral motivation.<br />
Table 2 presents platoon level correlations between the<br />
utilization of incentives (specifically and in the aggregate) and<br />
four key organizational variable scales. The reader will note<br />
that the use of incentives in the aggregate was more strongly<br />
correlated with the organizational variables than were the six<br />
specific incentives. Of the specific incentives, "Public<br />
recognitiontV, llAwardsl'., and '8Promotions'8 were the more strongly<br />
correlated. Table 2 also presents the platoon level correlations<br />
between the utilization of incentives and the three types of<br />
performance criteria (O/C ratings, commpany commander ratings and<br />
platoon ratings). While a few of the correlations reached<br />
statistical significance, the correlations are not that strong,<br />
particularly in comparison with those between NTC/JRTC motivation<br />
and unit performance or between platoon pride and performance<br />
(presented later). Thus, as suspected, the utilization of<br />
incentives seem not to be strongly associated with good unit<br />
performance but is strongly associated with other factors which<br />
more directly affect performance.<br />
Based on the pattern of highest inter-correlations and a<br />
little logic, the authors developed a tentative model describing<br />
how incentives might interact with other key organizational<br />
variables to impact upon platoon performance. The model, at this<br />
stage, must be considered only hypothetical: nevertheless, it<br />
provides a good starting point for subsequent inquiry. The model<br />
is protrayed in Figure 2.<br />
DISCUSSION<br />
While incentive utilization seems to play an important part<br />
in supporting variables directly impacting on unit performance,<br />
incentive utilization in the units examined was nonetheless low.<br />
This indicates both that leaders can more effectively use<br />
incentives and that, with more effective utilization, the<br />
numerical relationships found in this research might change.<br />
Since the aggregate use of incentives was more strongly,<br />
correlated with important organizational variables than the<br />
individual incentives, leaders may be able to shift from the use<br />
of constrained or slow-to-process incentives, or ones that take<br />
the soldier away from the unit (passes), to the use of incentives<br />
which are more efficient or effective (e.g., public recognition<br />
and awards). In interviews conducted at the post rotation, it<br />
was found that a major limitation on perceived incentive<br />
effectiveness was the length of time that occurred between the<br />
act or basis for the incentive and actual receipt of the<br />
incentive. Simply put, incentives should be used more and<br />
Processed more quickly. If this is done, unit performance should<br />
be Significantly enhanced.<br />
------a--...-~ .---. . ,
LE<br />
A<br />
D<br />
E<br />
0R-)<br />
S<br />
H<br />
I<br />
P<br />
I;<br />
L c<br />
E L<br />
A I<br />
RM<br />
NA<br />
’ T<br />
NE<br />
G<br />
4<br />
FIGURE 2. (TENTATIVE ) INCENTIVE UTILIZATION IMPACT MODEL+<br />
With Direct Inter-Scale Correlations<br />
VARIABLES b. c. d. e. 1. g. h .0/c h.co. h.plt<br />
a. Learning climate .79 .74 .81 .60 .60 .71 .52" .17 .30'<br />
b. Platoon pride .67 .82 ‘53 -62 .74 .57" .25 .26'<br />
c. Incentive utilization .71 .61 .55 .55 .39’ .09 ..l 9<br />
d. Job satisfaction .67 .74 .77 .65" .23 .22'<br />
e. eNxTpce?t!X%s .82 .55 -55" -.17 .08<br />
f. NTQJRTC .75 .65" .16 .07<br />
9.<br />
OWfl<br />
Job motivation -63" ,370. .31"<br />
Number of platoons 6 0 60 6 0 60 60 60 23 42 58<br />
‘rqx.05 *‘r=p
COHESION IN CONTEXT<br />
Guy L. Siebold<br />
U.S. Army Research Institute for the<br />
Behavioral and Social Sciences<br />
In the last few years, there has been a substantial amount<br />
of research on military unit cohesion. The research, by this<br />
author and others, has addressed some key questions: what is<br />
cohesion, how does it differ from similar constructs {e.g.,<br />
bonding and morale), how can it be measured, what impact does it<br />
have, and how does it change over time. However, left relatively<br />
unaddressed are the questions of how cohesion is associated with<br />
other major job related and organizational constructs and which<br />
of these constructs, relative,to each other, really makes a<br />
difference in organizational performance. The research presented<br />
in this paper was designed to start to answer these latter two,<br />
unaddressed questions. Specifically, the research examined the<br />
association between unit cohesion and unit performance directly<br />
and in the context of the platoon average degree of job<br />
satisfaction and platoon level of training proficiency.<br />
Method and Sample<br />
Data were collected by questionnaire from soldiers (squad<br />
members, squad leaders, platoon sergeants, and platoon leaders)<br />
within five light infantry battalions at three points in time.<br />
The first point in time (Base) was 4-G months before the<br />
battalion was scheduled to go through a training rotatior at<br />
either the U.S. Army National Training Center (NTC), Fort Irwin,<br />
CA or the U.S. Army Joint Readiness Training Center (JRTC), Fort<br />
Chaffee, AR. The second point in time (Pre-rotation) was 2-4<br />
weeks before the rotation: the third point (Post-rotationj -&as L-<br />
4 weeks after the training rotation. Questionnaires were<br />
administered by researchers from the U.S. Army Research<br />
Institute.<br />
Base and pre-rotation questionnaires were given typically to<br />
one company of. soldiers at a time in a classroom c.r dayroom<br />
setting and, being up to 160 items long, took the average soldier<br />
about 30 minutes to complete after instructions. Soldiers<br />
responded on a machine readable answer sheet. Post-rotation<br />
questionnaires were short (21 items plus some unit and position<br />
identification questions) and took soldiers less than ,lO minutes<br />
to complete; responses were made on the questionnaire itself.<br />
Post-rotation questionnaires were given at the st;?rt of<br />
.---.--- --.--. - -....... -_ - _...,<br />
The views expressed in this paper are those cf the :ii;thor<br />
and do not necessarily reflect the views of the c.C. *_^ >- v- 7 II . . 1.<br />
Research Institute or the Department of the Arm{.
interviews to groups of*soldiers in a company. All the platoon<br />
leaders in a company were one group; all the platoon serge,ants<br />
were a second group; two thirds of the squad leaders were a third<br />
group; and all squad members from one squad in the ccmpany formed<br />
a fourth group. Post-rotation questionnaires, along with the<br />
subsequent group interviews, were usually conducted in an office<br />
or dayroom setting.<br />
The base and pre-rotation questionnaires contained scales<br />
measuring cohesion and other job related and organizational<br />
constructs along with various demographic items. The postrotation<br />
questionnaires focused on soldier perceptions (self<br />
ratings) of mission performance during their recent rotation. In<br />
addition, for the two battalions which rotated through the JRTC,<br />
ratings on leader and platoon performance were provided just<br />
after the rotation by the observer/controllers who observed each<br />
platoon'during the rotation. In other words, the base and prerotation<br />
questionnaires contained the home station determinants<br />
(predictors) of performance: the post-questionnaires and ratings<br />
from the observer/controllers functioned as criterion measures of<br />
that performance. The total sample from the 5 light infantry<br />
battalions was 60 platoons: 45 line platoons, 5 scout platoons,<br />
5 mortar platoons, and 5 anti-tank platoons.<br />
Questionnaire items were structured to form scales measuring<br />
the constructs investigated. Scales addressed the following<br />
aspects of cohesion: squad member horizontal bonding, platoon<br />
leadership team (platoon leader, platoon sergeant, and squad<br />
leaders) horizontal bonding, vertical bonding between the squad<br />
members and the platoon leaders, platoon pride, and Army<br />
identification. Squad member horizontal bonding (SMHB) items<br />
measured whether squad members felt they cared about one another<br />
and worked together well as a team. Platoon leadership team<br />
horizontal bonding (LHB) items measured the extent to which the<br />
platoon leaders cared about one another and worked well together<br />
as a team. Vertical bonding (VB) items measured the extent to<br />
which subordinates felt their leaders were skilled and looked out<br />
for the needs of their subordinates. Platoon pride (PRIDE) items<br />
measured the extent to which members were proud of being in their<br />
platoon and played an important part in it. Army identification<br />
(AI) items measured the extent to which soldiers felt a part of<br />
the Army and that its successes were their successes. Soldiers<br />
responded to the cohesion questionnaire items using a five point,<br />
strongly agree to strongly disagree response scale.<br />
Scales also addressed constructs such as job motivation (job<br />
involvement), JRTC/NTC motivation, expectations of the value of<br />
JRTC/NTC training, job satisfaction, company learning climate,<br />
and level of task and mission training. As examples, the co-mpany<br />
learning climate items measured whether soldiers were given a lot<br />
of responsibility, got feedback on how they were doing, and were<br />
helped to learn from their mistakes; the job satisfaction items<br />
measured whether soldiers felt their work was interesting and<br />
useful. Most of the scales, or earlier versions of them, had
een used in prior research and thus had known or expected<br />
characteristics.<br />
For criterion measures of platoon performance, the ratings<br />
of three groups were used: observer/controllers (OCs) at the<br />
JRTC, company commanders (COs) rating their three platoons, and<br />
the platoon members. (PLT) themselves. The OC ratings were done<br />
at the JRTC after the rotation was completed: the CO and PLT<br />
ratings were made during post-rotation data collection. Each<br />
rater rated each platoon, about whose performance he was<br />
knowledgeable, on its performance during each mission conducted<br />
(e.g., movement to contact, deliberate attack, and defense). A<br />
rater's average rating of the platoon across observed missions<br />
became the criterion score. Raters used a 4 point scale:<br />
Trained, Needs a little training, Needs a lot of training,<br />
Untrained. PLT ratings were computed by averaging criterion<br />
scores across the four positions (squad member, squad leader,<br />
platoon sergeant, platoon leader), i.e., equally weighted by<br />
positon. Readers can contact the author for additional<br />
information on any of the scales. The predictor data used for<br />
the analyses in this paper are from squad member responses only.<br />
Leader perspectives will be addressed in future analyses.<br />
Results<br />
Scales. The questionnaire predictor scales used in this research<br />
typically had means of about 3.1 - 3.6 on the five point response<br />
scale, with standard deviations around 1.0 at the individual<br />
respondent level and around .5 as averaged at the platoon level.<br />
Scale reliability estimates (alpha values) were typically around<br />
the .8 level. The platoon performance criterion scales had the<br />
following characteristics: OC ratings - Mean = 2.1, SD = .41;<br />
CO ratings - Mean = 3.2, SD = .43; PLT ratings - Mean = 3.2, SD =<br />
. 32. Number of platoons rated were: OC - 23; CO = 42; PLT = 59.<br />
Direct impact of cohesion. As noted in Table l.a., all the<br />
aspects of cohesion correlated significantly with platoon<br />
performance as rated by the OCs at JRTC and as rated by the<br />
platoon members. The cohesion - performance relationship based<br />
on CO criteria was in the same direction but at a lower, nonsignificant<br />
level. Also, as noted in Table l-b., the different<br />
aspects of cohesion all correlated significantly with each other,<br />
although at a notably lower correlation coefficient level with<br />
the Army identification aspect. An initial factor analysis of<br />
squad member responses indicated that Army identification was a<br />
separate construct from the others, that squad member bonding was<br />
a separate construct, and that the other scales were linked to<br />
perceptions about the platoon leaders. Platoon pride loadings<br />
were split between the squad member and leader factors.<br />
Relation of cohesion to other constructs. As noted in Table<br />
l.c., other standard organizational constructs and level of<br />
training were related to the cohesion scales. In short, there<br />
-I -i ii
Table 1.a. Correlations Between cohesion and Platoon Average<br />
Mission Performance at JRTC or NTC by Rater of Performance<br />
Cohesion Scale<br />
SMHB<br />
Performance Raters<br />
oc co PLT<br />
. 52** .31* .30*<br />
LHB . 52** .15 .38**<br />
VB .47** .18 .31**<br />
PRIDE . 57** .25 .26*<br />
----._<br />
AI . 43** .20 .24* -<br />
Table 1.b. Intercorrelations Among Cohesion Scales<br />
SMHB<br />
LHB VB PRIDE AI<br />
. 74 .63 .a4 .54<br />
LHB . 77 .a1 .47<br />
VB .73 .45<br />
Table l.c. Correlations Between Cohesion and Standard<br />
Organizational Constructs<br />
Construct<br />
Job Motivation<br />
Cohesion Scale<br />
SMHB LHB VB PRIDE AI<br />
. 65 .66 .51-- .74 -71<br />
JRTC/NTC Motivation . 47 .49 -31. .62 .65<br />
Job Satisfaction . 67 -72 -59 .82 .69<br />
Learning Climate<br />
Task/Mission Training<br />
.67 .81 -75 -79 .64<br />
. 30 .31 -48 -31 .29*<br />
Note : * = p
-_._____ __ _.._^.._ _.. _.^ _--<br />
was a great deal of inter-dependence among the predictor<br />
constructs. This, of course, led to some analytic concerns, in<br />
particular about whether the cohesion construct correlations with<br />
the criteria were independent or due to the influence of some<br />
underlying factor or other construct. An examination of all the<br />
construct inter-correlations and exploratory factor analyses<br />
suggested that there might be a soldier general perception of job<br />
conditions accounting for the level of inter-correlation. To<br />
investigate this possibility, the correlations were re-computed<br />
controlling for the mean platoon job satisfaction. The results<br />
are shown in Tables 2.a. and 2.b.<br />
Table 2.a. Partial Correlations Between Cohesion and Platoon<br />
Average Mission Performance at JRTC or NTC by Rater of<br />
Performance, Controlling for Job Satisfaction<br />
Cohesion Scale<br />
_ Performance Raters<br />
oc co PLT<br />
SMHB<br />
--<br />
. 16 .21<br />
___--___-_--_-<br />
.21<br />
LHB . 10 -* 02 . 32**<br />
VB . 15 -05 .23*<br />
PRIDE . 11 .11 .13<br />
AI -.04 .06 .13<br />
* = p
with) strong platoon performance at the JRTC or NTC. Among all<br />
the predictor constructs, JRTC/NTC motivation and job motivation<br />
were the strongest correlates with the OC criterion ratings, .65<br />
and .63 respectively. Their correlations were also reduced when<br />
job satisfaction was controlled, to .34 and .27.<br />
Regardless of the large common variance among the predictor<br />
construct scales, a critical concern was whether the common<br />
variance among the predictor scales was due in part to the level<br />
of task and mission training. If this were the case, then the<br />
correlations with the criteria ratings could be simply an<br />
instance of high or low training at pre-rotation resulting in<br />
high or low performance at JRTC or NTC. To examine this, partial<br />
correlations were again computed controlling for the squad<br />
members' pre-rotation estimates of their platoon's level of task<br />
and mission training (measured using the same response scale as<br />
the criterion raters). The results are given in Table 3.<br />
Table 3. Partial Correlations Between Cohesion and Platoon<br />
Average Mission Performance at JRTC or NTC by Rater of<br />
Performance, Controlling for Pre-rotation Training Level<br />
Cohesion Scale<br />
SMHB<br />
Performance Raters<br />
oc co PLT<br />
. 49** .32* .22<br />
LHB .49* .16 .30*<br />
VB �<br />
42* .20 .lEj<br />
PRIDE . 55** .27* .I7<br />
AI . 39* .21 .16 --<br />
* = pc.05; ** = PC.01 N= 20 39 55<br />
As Table 3 shows, the partial correlations (with perceived<br />
training level controlled) are not much different from the direct<br />
correlations given in Table 1.a. Also, there was no or little<br />
change from the direct correlations for the other major predictor<br />
constructs, with training controlled. A fair interpretation of<br />
Table 3 would be that cohesion adds significantly to the mission<br />
performance of platoons at training centers such as the JRTC and<br />
the NTC beyond that portion of performance due to level of<br />
training. In other words, cohesion and other job related and<br />
organizational constructs provide a separate, important<br />
contribution to performance. Speculating from the data in this<br />
research, one can estimate that separate contribution to be in<br />
the range of 10 - 40% of the performance variance. Obviously,<br />
further research remains to be done in sorting out the nature and<br />
inter-relationships of the predictor constructs and in<br />
determining the constructs' relationship with performance across<br />
the range of construct values (e.g., low, medium, and high levels<br />
of the constructs).
EVALUATION OF THE ARMY’S FINANCE SUPPORT COhlMANlI<br />
ORGANIZATIONAL CONCEPT<br />
Raymond 0. Waldkoetter, William R. White, Sr.! tend<br />
Phillip L. Vandivier<br />
U.S. Army Soldier Support Center<br />
Fort Benjamin Harrison, IN 46216-5700<br />
A new-modular concept of organization was developed for the Finance Support<br />
Command (FSC) missions/functions to provide direct financial support to commanders.<br />
units, and activities on an area basis. An Army restructuring initiative resulted in the reorganization<br />
of the Finance Corps’ force structure with the planned FSC being a modular<br />
TOE, that is sized depending on the population supported with two to six assigned finance<br />
detachments. Before implementing the new organizational concept, a decision was later<br />
made at Headquarters, Department of the Army to conduct an evaluation to determine if<br />
the modular concept FSC would have the capability to perform the minimum essential<br />
wartime tasks. Those wartime tasks place the FSC as a focal point for providing commerci:tl<br />
vendor and contractual payments, various pay and disbursing services, and limited accounting<br />
on an area basis. Finance units must also be prepared to protect and defend themsrlvcs<br />
to continue sustainment of the force and maintain battle freedom for combat units to engage<br />
the enemy.<br />
A study team identified mission and functions to be performed by the FSCs in w:lrtime.<br />
The relationships of the FSC with organizations above, below, and parallel were<br />
outlined along with the interactions between these organizations. The study team estahlished<br />
criteria to be met with current doctrine and “principles of support” and “standards of<br />
service” as the foundation for battlefield finance support functions. A notional concept NX<br />
developed, staffed and evaluated to determine the preferred FSC unit force structure. With<br />
the assistance of subject-matter experts (SMEs) the capability of the preferred FSC design<br />
and related functions was analyzed, to address military finance support requirements for<br />
various theater scenarios, across the spectrum of contlict and in different geographical<br />
locations. Then, major Army commands (MACOMsj concurred with the recommeIld~ttior1<br />
to adopt the modular concept FSC organization with the proviso that the concept he dul)<br />
evaluated prior to implementing actions.<br />
The Soldier Support Center (SSC) hosted a MACOM level Finance Study Advi?;orF<br />
Group (SAG), 30 June - 1 July 1988, to further assess the proposed modular organizational<br />
design. The SAG recommended fielding the modular design and conducting an on-site fiel:i<br />
evaluation of the design prior to world-wide implementation. The Department of the Arill>..<br />
Deputy Chief of Staff for Operations (DCSOPs) concurred with bc.)th recommend~ltii-;ns.<br />
The views expressed in this paper are those of the authors and do not nzcessitrily reflect tilt<br />
view of the Soldier Support Center or the Department of the Army.
This field validation complied with mandatory guidance directing field validation for doctrinal,<br />
training, organizational, leadership, and materiel products before operational use<br />
(SSC, 1989). The field validation was to determine, then, if the modular organizational<br />
design was capable of supporting battlefield requirements (SSC, 1990a). Validation methodology<br />
was based on approved operational and training evaluation procedures and coordination<br />
of critical issues and criteria (TRADOC, 1987) with MACOMs and G3/J3 staffs, an<br />
integral part of the training, force structuring, and TOE approval process for finance units.<br />
METHOD<br />
The field validation was designed to be a “self-evaluation.” Evaluation materials<br />
were provided to the participating MACOMs who selected finance SMEs to observe the<br />
FSC unit structure, while the FSC was conducting an operational exercise or training and<br />
performing wartime tasks under simulated conditions (Thornton III & Cleveland, 199Oj.<br />
The SMEs were instructed to identify whether wartime missions/functions were in a category<br />
of “go”/“no go” or “unobserved,” according to the critical issues and related criteria<br />
they entered on the field validation data collections sheets. All “no go” situations were to bc<br />
explained as to which factor caused failure, such as doctrine, leadership, materiel, training,<br />
or organization. Required guidance and advisory assistance were furnished by the SSC<br />
throughout the evaluation.<br />
Major characteristics of the modular concept were to be operationally exercised<br />
during the field validation. It was to be determined that an acceptable level of wartime taskforcing<br />
and continuous operation is facilitated. Wartime and peacetime decentralized FSC<br />
detachment operations were to be effectively exercised with suitable support provided by<br />
the host unit. The designated issues and criteria were to be evaluated based on systematic<br />
SME observations during exercises or training. The SME evaluators were required to<br />
observe one FSC within their MACOM. The FSC and detachments were to be configured<br />
as described in the validation plan. It was requested every effort be made to control the<br />
wartime scenario so that realistic combat situations were experienced by the designated<br />
command and detachment personnel. The MACOM planning for a selected exercise/<br />
training sequence ensured that the SMEs were aware of the purpose of validation requirements<br />
and fully knowledgeable in finance wartime operations.<br />
The SMEs entered the FSC issues (three) and criteria (35,21, and 2, respectively,<br />
per issue) on the data collection forms and were instructed to keep the issues and criteria in<br />
numerical sequence, providing then 58 possible rating observations having “go”/“no go” or<br />
“unobserved” alternatives. Eleven SME evaluators collected validation data with five participating<br />
in Korea (26-28 Jun 89) and the other six at Fort Hood, TX (20-22 Sep 89). The<br />
five SMEs in Korea were from the 175th Theater Finance Command, 176th Finance Support<br />
Unit, and the six at Fort Hood, TX were composed of two evaluators from Forces<br />
Command (FORSCOM) Headquarters, Finance and Accounting Division (Fort McYherson,<br />
GA) and four from Fort Hood, 3rd Finance Group, 502d Finance Support Unit. Three<br />
SSC referee-observers participated with the provisional FSC units in Korea and at Fort<br />
Hood to furnish whatever expertise might seem useful without causing any disruptive re;lctions.<br />
4 5 :
The three critical issues were formulated to cover all major concerns regarding the<br />
expected operational capability of the FSC organization:<br />
tiorts?<br />
3. Carl the modular concept FSC orgunization transitiotl from peace to wartime operll-<br />
With the criteria requirements subsumed for each issue indicating responses of “go”/<br />
“no go” or “unobserved~” and space for comments from the SME evaluators, the collected<br />
responses were tested for significance using chi-squared. Although simple majority judgments<br />
are often employed to make decisions when deliberating on courses of action to be<br />
selected, it was decided that due to the operational consequences, the decision to adopt the<br />
FSC organization should be based on significant data comparisons to avoid any random or<br />
possibly biased observations.<br />
RESULTS AND DISCUSSION<br />
The SME evaluators responded to Issue 1 and its criteria with 150 (Goj, 12 (NO<br />
GO), and 124 (UNOBS) observations. Compared to the expected distribution of responses.<br />
it was found these responses were significantly different by chi-squared: X2 (2, N = 286) =<br />
112.81,~ c.001. While there is definitely a significant difference among the three categories<br />
of responses, only the difference between the “go” and “no go” would be significant. Even<br />
though the difference between the “no go” and “unobserved” would be significant, the<br />
meaning could not be clear since many comments related to the “unobserved” responses<br />
inferred that the techrzical wartime missions/functions were feasible (“go”). There was an<br />
overall impression that the technical wartime missions/functions can be performed. thc@~<br />
equipment and certain procedures may act as constraints. For Issue 1 there were more<br />
“unobserved” responses than for the other two issues. Some “no go” responses resulted<br />
from observations of deficient transportation assets and of lack of sufficient staffing. “Unobserved”<br />
responses were further attributed to evaluators judging some tasks were feasible.<br />
but resources were not available to operate during the training and field exercises. Technology<br />
and staffing shortages were repeatedly cited as cause for non-evaluation (omitted) and<br />
“unobserved” responses, with some missions/functions tending to become evaluated as “no<br />
go” without specific available equipment/materiel. Again, many “unobsen?ed” responses<br />
acknowledged the potential validity of the “go’s:<br />
Issue 2 and its criteria showed evaluator responses of 146 (GO.), 17 (NO GO). and 3~<br />
(UNOBS). Compared to the expected distribution of responses, it was found these responses<br />
were significantly different by chi-squared: X2 (2, N = 199) = 746.25, p C .(iOl.<br />
There is definitely a significant difference also among the three categories of rzsponser
here, and the other differences between the “go” and “no go” and “unobserved” are highly<br />
significant as well. There was a high degree of confidence that the lncticd wartime missions/<br />
functions can be performed as a result of the observed field validation training/exercises.<br />
The “go’s” showed confidence related to maintaining unit strength and adequate logistical<br />
and communication support. Responses of “no go’s” and “unobserved” pointed up training<br />
and equipment concerns as did some non-evaluated (omitted) tasks. Company battlefield<br />
tasks were considered feasible but some evaluators mistakenly omitted replies. Medical<br />
care in the field, NBC, and Security problems were anticipated with the level of transportation<br />
serving as a crucial balance between “go” or “no go” decisions.<br />
Issue 3 and its criteria showed evaluator responses of 5 (GO), 1 (NO GO), and 7<br />
(UNOBS). Compared to the expected distribution of responses, it was found these responses<br />
were significantly different by chi-squared: X2 (2, N = 13) = 4.31,~ ~45. There is<br />
a significant difference among the three categories of responses but not in favor of � ‘go’s.”<br />
However, there was sufficient reason to conclude the FSC organization will be able to<br />
transition frompettce to war-rime operations. Some dissonance existed concerning how to<br />
best prepare for the transition process, but “go” responses from the Korean evaluators in<br />
the “most like” wartime setting, indicated no difficulties in preparing for the transition. The<br />
only “no go” reply disagreed with status of the TOE unit garrison structure as a basis from<br />
which to initiate an effective transition process. Some omitted responses resulted from the<br />
units not being able to clarify the intent of this issue. Enough observations did result to help<br />
modify guidelines for transitioning.<br />
The three issues and related criteria when summed showed evaluator responses of<br />
301 (GO), 30 (NO GO), and 167 (UNOBS). With 140 SME responses omitted due to<br />
equipment availability, lack of criteria clarity, or redundancy of meaning, 78% (498) of the<br />
possible 638 responses were recorded for the field validation. Compared to the expected<br />
distribution of responses, it was found these responses were significantly different by chisquared:<br />
X* (2, N = 498) = 221.22,~ c.001. Comparisons showed the SME evaluator “go”<br />
responses were highly significant exceeding “no go” and “unobserved”, separately and<br />
combined. These results indicate that responses in favor of “go” judgments could hardly<br />
occur by chance, or by chance only once per thousand measures in similar data sets.<br />
CONCLUSIONS<br />
By aggregating the interrelated SME evaluator responses for the three issues and<br />
criteria, findings were derived describing useful observations to show the potential operational<br />
capability of the FSC modular organization.<br />
Comments from the 175th Theater Finance Command (Korea) forwarding their<br />
validation data supported the operational capability of the FSC Modular Concept. Suggcstions<br />
were given to improve operations by 11lanning to solve equipment. personnel, and<br />
transportation constraints.<br />
The 3rd Finance Group (Fort Hood, TX) evaluation comments indicated the FSC
.<br />
operational capability was validated to perform most of the essential techzicul and tacticnl<br />
battlefield functions. It was noted that some deficiencies in communication equipment,<br />
transportation, and staffing could limit the FSC in performing its battlefield mission as<br />
described in the Finance Operations manual (FM 14-7, 1989). Also further noted by the 3rd<br />
Finance Group were possible problems in the FSC transition from apencefime to wnrtime<br />
configuration, if it organizes and trains differently duringpeacetime than it is expected to<br />
operate in wartime.<br />
Comments submitted with the FORSCOM validation data elements pointed out<br />
“that the FSC is a sound structure to provide finance support to commanders, units, and<br />
soldiers.” It was noted, however, “the proposed TOE is not designed with sufficient assets<br />
(personnel, communication equipment, vehicles) to operate tactically in a dispersed mode.”<br />
From the visit by three SSC referee-observers in June 1989 to Korea, a trip report<br />
officially described from that early preview most of the results experienced by units in<br />
conducting later operational and training exercises to validate the FSC Modular Concept.<br />
Their findings generally anticipated from their critique and review of training and field<br />
exercises what other SME evaluators would experience. Based on the review and on-site<br />
Korea and Fort Hood visits, SSC observers agreed that the FSC can be expected to accomplish<br />
the minimum essential wartime tasks under the modular concept with minor modifications<br />
in staffing and equipment (SSC, 1990b). With suggested planning a smoother transition<br />
can be facilitated by the modular concept frompeac&ne to wartime operations.<br />
REFERENCES<br />
1. Finance Operations EM 14-7). (1989) Washington DC: Headquarters, Department of the Army.<br />
2. Thornton III, G. C., & Cleveland, J. N. (1990). Developing managerial talent through simulation.<br />
American Psvchologist, 45, 190-199.<br />
3. U.S. Army Soldier Support Center (SSC). (1990a). Field Validation of the Finance Support Command<br />
(FSC) Modular Concent. Unpublished manuscript, Directorate of Combat Developments, Fort Harrison. tN<br />
4. U.S. Army Soldier Support Center (SSC). (1990b). Finance Materiel Requirements Stub. Fort Harrison.<br />
IN: Directorate of Combat Developments.<br />
5. U.S. Army Soldier Support Center (SSC). (1989). Personnel Service Command EX3 and Finance<br />
Support Command (FSC) Field Validation Plan. Fort Harrison, IN: Directorate of Combat Dcvclopmcnts.<br />
6. U.S. Army Training and Doctrine Command (TRADOC). (1987). Handbook for Onerational Issues and<br />
Criteria. Fort Monroe, VA: Advanced Technology (Reston, VA).<br />
_,<br />
4 5 a
LEADER INITIATIVE: FROM DOCTRINE TO PRACTICE’<br />
Alma G. Steinberg and Julia A. Leaman<br />
U.S. Army Research Institute<br />
for the Behavioral and Social Sciences<br />
Introduction<br />
Initiative has been considered to be an important component of good leadership, especially military<br />
leadership (e.g., Headquarters Department of the Army, 1983; Rogers et al., 1982; Borman et al., 1987).<br />
However, there has been very little research on the actual practice of initiative by military leaders. This<br />
paper looks at leader initiative in Army combat units in terms of the relationship between leader initiative<br />
and unit performance, inhibitors of initiative, and approaches for developing leader initiative.<br />
Army doctrine is “what is written, approved by an appropriate authority and published concerning the<br />
conduct of military affairs” (Starry, 1984, p. 88). Two doctrinal publications define and describe leader<br />
initiative. One focuses on the Army’s doctrine for combat on the modern battlefield and is articulated in<br />
FM 100-5 (Headquarters Department of the Army, 1982). It reflects “the views of the major commands,<br />
selected Corps and Divisions and the German and Israeli Armies as well as TRADOC” (DePuy, 1984,<br />
~86). According to FM 100-5, initiative is something that large unit commanders must encourage in<br />
their subordinates. Initiative means to “act independently within the context of an overall plan,” “exploit<br />
successes boldly and take advantage of unforeseen opportunities,’ “deviate from the expected course of<br />
battle without hesitation when opportunities arise to expedite the overall mission of the higher force,” and<br />
“take risks” (p. 2-2).<br />
The second doctrinal publication addressing the importance of leader initiative focuses on militan,<br />
leadership doctrine (Headquarters Department of the Army, 1983). Here initiative is defined as “the<br />
ability to take actions that you believe will accomplish unit goals without waiting for orders or<br />
supervision. It includes boldness” (p. 123). Emphasis is placed on the importance of communicating<br />
values, goals, and accurate information about the enemy and other factors that affect the mission to<br />
subordinates so that the subordinates, in turn, can use initiative to accomplish the mission when they<br />
are out of contact with the leader or higher headquarters.<br />
The data reported in this paper are from Army combat units. They were collected as part of a larger<br />
project conducted in support of the Center for Army Leadership and the Combined Arms Training<br />
Activity; the project focuses on determinants of small unit performance. Thus far, data have been<br />
collected from five light infantry battalions that went through rotations at the Army’s Combat Training<br />
Centers (CTCs). The goals of this project are to identify leadership and other factors important to unit<br />
effectiveness and readiness, and to develop interventions for improving these factors.<br />
Method<br />
The data presented here come from several sources. They include data collected from units just<br />
prior to their participation in a CTC rotation, data collected from units just after their participation in a<br />
CTC rotation, ratings of observer-controllers (OCs) at a CTC, and written take-home packages that<br />
provide feedback on unit performance at a CTC, as follows:<br />
(a) Pre-CTC questionnaire responses by squad members, squad leaders, platoon sergeants, and<br />
platoon leaders in battalions shortly before their CTC rotations.<br />
(b) OC Ratings of CTC performance for two battalions,<br />
‘The views expressed in this paper are those of the authors and do not necessarily reflect the<br />
views of the U.S. Army Research Institute or the Department of the Army.<br />
455<br />
.
(c) Take-home package observations on CTC performance, by 00, for 12 CTC rotations which<br />
/ took place during 1988, 1989, and 1990.<br />
(d) Individual and small group interview responses of squad members, squad leaders, platoon<br />
sergeants, platoon leaders, company commanders, battalion executive officers, and battalion<br />
commanders in five battalions shortly after they completed their CTC rotation.<br />
(e) Post-CTC questionnaire responses by squad members, squad leaders, platoon sergeants,<br />
platoon leaders, and company commanders in five battalions shortly after they completed their<br />
CTC rotation.<br />
1. Soldier views of initiative.<br />
Results<br />
Pre-CTC Questionnaires. Squad members, squad leaders, platoon sergeants, and platoon leaders<br />
from two battalions (n=600) were asked, “When the leaders in your unit talk about initiative, what do they<br />
typically mean?” About 85% of the respondents to this open-ended question indicated that initiative was<br />
seen as involving the performance of routine or SOP behavior, without being told and/or without being<br />
supervised. It involved the accomplishment of their own job or the job of the leader in his absence. The<br />
remaining 15% of the respondents indicated that when leaders encourage them to use initiative, they use<br />
initiative to mean: do what we tell you to do, do objectionable tasks (e.g.. extra work, unpleasant tasks,<br />
low-level work), and/or make the leaders look good.<br />
Post-CTC Interviews. Most of the responses to the post-CTC interviews (from five battalions)<br />
indicated that the respondents felt that initiative involved the activity of carrying out their jobs or taking<br />
over for an absent leader. Initiative was seen as the initiation or continuation of behavior without being<br />
told or without the supervisor’s presence. Several of the examples of initiative that were given involved<br />
the recognition of a problem and the request to a higher level leader to be permitted to follow a different<br />
course of action (that was still within the scope of their job). In addition, there were a few incidents of<br />
initiative reported which involved the recognition that something beyond one’s own immediate<br />
responsibilities needed to be done, and personally performing the necessary tasks to get it done. When<br />
asked directly about the importance of encouraging subordinates to carry out the commander’s intent by<br />
exploiting successes boldly, taking advantage of unforeseen opportunities, deviating from the expected<br />
course of battle, and taking risks, respondents indicated that these were not really high priorities. From<br />
battalion commander on down, they said, in essence, the main thing is to have subordinates at each<br />
level who are well disciplined, technically and tactically competent, and are motivated to do their jobs<br />
well.<br />
2. The relationshia between leader initiative and unit combat performan-.<br />
Interview respondents indicated that they felt initiative (i.e., accomplishing the job without being told<br />
and/or without being supervised) is very important for success in combat. Pre-CTC questionnaires were<br />
examined to determine whether leader initiative was, in fact, a predictor of unit performance at a CTC.<br />
Table 1 shows that pre-CTC squad member ratings of the level of squad leader initiative are significantly<br />
related to OC ratings of platoon performance at a CTC. Similarly, pre-CTC squad member and squad<br />
leader ratings of platoon sergeant initiative are significantly related to OC ratings of platoon performance<br />
at CTC. However, pre-CTC ratings of platoon leader initiative by squad leaders, platoon sergeants, and<br />
company commanders are not significantly related to OC ratings of platoon performance at a CTC.<br />
Initiative, in the context of doing one’s job in the absence of being told or supervised, is not<br />
perceived as the same as motivation. For example, OC ratings of platoon motivation and platoon<br />
initiative at a CTC were not significantly correlated (r= .12). Neither were OC ratings of how hard ?he<br />
platoon worked and tried hard to do as good a job as possible significantly correlated with their ratings<br />
of platoon initiative (r=.19). Furthermore, OC ratings of platoon motivation and how hard the pla?oon<br />
worked were not significantfy correlated with their ratings of platoon performance, whereas OC ratings or<br />
platoon initiative were related to OC ratings of performance (r= .45, p < .05). Even more support of the<br />
relationship between initiative and performance comes from the significant correlation (r = .44. p 6. .35:
etween OC ratings of platoon initiative at a CTC and post-CTC ratings of platoon performance by<br />
company commanders.<br />
Table 1. Correlations Between Leader initiative Rated Pre-CTC and Platoon Performance at CTC<br />
Pre-CTC Initiative Ratings<br />
Of squad leader by squad members<br />
Of platoon sergeant by squad members<br />
Of platoon sergeant by squad leaders<br />
Of platoon leader by squad leaders<br />
Of platoon leader by platoon sergeant<br />
Of platoon leader by company commander<br />
* p c .05; n = 23 platoons<br />
3. Inhibitors of initiative.<br />
Correlations with<br />
Platoon Performance Rated by OCs<br />
r = .41*<br />
r = .44*<br />
r = .43*<br />
r = .31<br />
r = .25<br />
r = -.13<br />
In doctrine (Headquarters Department of the Army, 19&X3), identified inhibitors of initiative are: lack of<br />
understanding the mission, lack of accurate information, and lack of understanding the frame of<br />
reference (i.e., values, goals, and way of thinking) of the higher level leader and the subordinates. Figure<br />
1 provides a summary of the inhibitors of initiative mentioned in the CTC take-home packages, the post-<br />
CTC interviews, and the post-CTC questionnaires. As can be seen from Figure 1, the reported inhibitors<br />
of initiative cover a broad range of areas and provide additional inhibitors to those identified in doctrine.<br />
These include micromanagement, unit climate, concern about the reaction of others, fatigue, and lack of<br />
motivation.<br />
4. ADDroaches for deVelODinCl initiative in subordinates.<br />
In post-CTC interviews, leaders (squad leaders, platoon sergeants, platoon leaders, company<br />
commanders, battalion commanders) indicated that they do try to develop initiative in subordinates.<br />
They focus primarily at the squad member and squad leader levels and use the following approaches to<br />
develop initiative:<br />
(a) They develop the prerequisites. Leaders frequently mentioned three areas that they felt were<br />
prerequisites for showing good initiative: good discipline, proficiency in performing the job, and<br />
self-confidence. They try to develop the first two with training and the third, confidench, through<br />
physical training (PT).<br />
(b) They tell subordinates to show initiative.<br />
(c) They provide opportunities for subordinates to perform the role of their leader. Typically<br />
squad members are told to take over for their squad leaders, either as temporary fill-ins or for<br />
developmental purposes.<br />
(d) They reward initiative. Those showing exceptional initiative during training exercises are<br />
nominated for awards.<br />
457
Figure 1.<br />
.<br />
INHIBITORS OF INITIATIVE<br />
CTC Take-home Packaaes<br />
By Source of Information<br />
- lack of information<br />
- poor operations orders<br />
- micromanagement<br />
Post-CTC Interviews<br />
- micromanagement and lack of trust<br />
- lack of information<br />
- not permitted<br />
- climate (lack of support for initiative, don’t make mistakes)<br />
- missions involving the larger unit (e.g., Bn as opposed to Plt)<br />
. no opportunity (e.g., in dead tent, OC restrictions)<br />
- it’s safer for your career and shows more loyalty if you don’t<br />
let the higher leader know his ideas aren’t good<br />
Post-CTC Questionnaire (n = 322)<br />
34% Lack of relevant information<br />
31% Fatigue<br />
29% Concern about superior’s reaction<br />
24% Lack of understanding the mission<br />
20% Lack of a clear solution to the problem<br />
17% Lack of motivation<br />
16% Fear of making a mistake<br />
6% Desire to avoid being noticed<br />
5% Concern about subordinate’s reaction<br />
12% Other (e.g., too much changing of missions, inexperienced<br />
leaders, lack of time due to changes in plans or late<br />
operations orders, micromanagement)<br />
(NOTE: Percents do not add up to 100% because respondents were<br />
instructed to indicate all reasons that applied.)<br />
458
Conclusion<br />
This paper focused on leader initiative and looked at the relationship of leader initiative to unit<br />
performance. In this context, both doctrinal and field views of initiative were presented. Initiative, in the<br />
sense of doing one’s job without being told and/or being supervised, is seen as very important by<br />
soldiers and leaders. It was shown that leader initiative is significantly correlated with unit performance,<br />
and yet is clearly distinguishable from motivation. The fact that field views of the inhibitors of initiative<br />
are broader than those presented in doctrinal sources suggests that the information gained in this<br />
research might be of benefit to doctrinal proponents as well as those developing leader training courses<br />
or conducting field training.<br />
References<br />
Borman, W. C., Motowidlo, S. J., Rose, S. R., Hanser, L. M. (1987). Development of a model of soldier<br />
gffectiveness, (AR1 Technical Report 741). Alexandria, VA: U.S. Army Research.<br />
Institute.<br />
DePuy, W. E. (1984). Letter, General W. E. DePuy to General Fred C. Weyand, Chief of Staff, Army, 18<br />
February 1976. In John L. Romjue, From active defense to airland battle: The develoDment<br />
gf Armv Doctrine 1973-1982. Fort Monroe, VA: U.S. Army Training and Doctrine Command.<br />
Headquarters Department of the Army, (1982). Ooerations, (Field Manual 100-5). Washington, DC:<br />
Department of the Army.<br />
Headquarters Department of the Army, (1983). Militarv Leadership, (Field Manual 22-100).<br />
Rogers, R. W., Lllley, L. W., Wellins, R. S., Fischl, M. A., & Burke, W. P. (1982). Development of the<br />
P), (AR1 Technical Report 560).<br />
dexandria, VA: U.S. Army Research Institute.<br />
Starry, D. A. (1984). Commanders Notes No 3, Operational concepts and doctrine, 20 February 1979. In<br />
John L. Romjue, From active defense to airland battle: The develoDment of Army Doctrine 1973-1982.<br />
Fort Monroe, VA: U.S. Army Training and Doctrine Command.<br />
459<br />
.<br />
.
STARTING A TQM PROGRAM IN AN R&D ORGANIZATION<br />
Herbert J. Clark<br />
Brooks Air Force Base, Texas<br />
This paper reports the results of implementing a Total Quality<br />
Management (TQM1 Program in an Air Force research and development<br />
laboratory. It outlines how the Methodology for Generating<br />
Efficiency and Effectiveness Measures (MGEEM) was used to<br />
implement TQM, and describes the lessons learned in the process.<br />
The paper also gives guidelines for starting a TQM program and<br />
recommends using Organizational Development (OD) intervention<br />
techniques to gain acceptance of the program. Lessons learned<br />
stress the importance of choosing a skilled TQM facilitator,<br />
adequately training process action teams, and fostering open<br />
communications and teamwork to reduce resistance to change.<br />
GETTING STARTED<br />
People report they read the popular literature or hear a TQM<br />
briefing and come away with a general understanding of TQM<br />
philosophy, but no specific directions on how to get started.<br />
This condition is so common that, according to Kanji (19901, it<br />
even has a name: ‘Total Quality Paralysis!'<br />
Kanji's solution for overcoming this problem is to follow a fourstage<br />
TQM implementation procedure. It consists of collecting<br />
organizational information, getting top management support for<br />
TQM, developing an improvement plan, and starting new initiatives.<br />
Following these four steps, leads to commitment from the top, a<br />
united and coordinated middle management, and the data to make<br />
informed decisions -- essential conditions for TQM success.<br />
Behavioral scientists writing in the OD 1 i terature recommend<br />
similar procedures and have developed ------------ intervention ------ techniques for<br />
gaining management support for new initiatives. They consist of<br />
educational activities, questionnaires, team building exercises.<br />
and prescriptions of 'things to do' and 'things not to do.'<br />
French and Bell (19841 describe five types of interventions which<br />
range from working with whole organizations to working with teams<br />
and individuals. These interventions can be used in TQM programs<br />
to -increase participative management and intergroup cooperation.<br />
Coupled with TQM tools such as statistical process control, they<br />
can lead to increased productivity, better product quality, and<br />
enhanced customer satisfaction. Trying to introduce TQM without<br />
considering the behavioral dynamics of the organization<br />
Significantly reduces the chances for success, as illustrated<br />
below.<br />
460
AN ILLUSTRATION<br />
In 1988, the Air Force Human Resources Laboratory (AFHRL) started<br />
a Total Quality Management (TOM) program in the laboratory. The<br />
technique used to implement T4M was the Methodology for Generating<br />
Efficiency and Effectiveness Measure6 (MGEEM) described by Tuttle<br />
and Weaver (1986). MGEEM uses a group decision making technique<br />
to clarify an organization's mission, identify its customers,<br />
specify Key Result Areas (KRAs), and measure progress in the KRAs<br />
using mission effectiveness indicators. Air Force Regulation 25-5<br />
recommends using MGEEM to do TQM.<br />
Despite top management support, the reaction to starting MGEEM at<br />
AFHRL was negative. Had there been a vote at the TQM start-up<br />
meeting, it is unlikely that a majority of the laboratory staff<br />
would have endorsed implementing TQM or MGEEM. The commander saw<br />
no reasonable alternative to MGEEM, however, so he directed its<br />
implementation.<br />
Twenty months after the program began, support for MGEEM was still<br />
weak. Of the 94 (out of 380) people answering a laboratory TOM<br />
newsletter, 80% said TQM/MGEEM was of 'No Value' or 'Some Value.'<br />
Only 20% said it was of 'Moderate Value' or 'Significant Value.'<br />
Several written replies said to stop MaEEM. The attitude toward<br />
the TQM philosophy was more positive.<br />
MGEEM was rejected because people in the laboratory did not have a<br />
sense of ownership in the program. Although division-level<br />
management participated in selecting the MGEEM KRAs and<br />
indicators, they did not support using MGEEM in an R&D laboratory.<br />
Thie attitude was passed on to lower levels of management, so few<br />
people supported the program. This attitude prevailed, even though<br />
several of the scientists in the laboratory helped develop MGEEM.<br />
The finding that MGEEM was not widely accepted at AFHRL does not<br />
mean that it is an ineffective technique for implementing TOM.<br />
Some observers felt that MGEEM was rejected prematurely and did<br />
not receive a fair test. Others felt that its rejection may have<br />
been more a consequence of how management introduced MGEEM than<br />
its methodology.<br />
Had AFHRL used OD intervention techniques while implementing 'NM,<br />
it is possible that they would have chosen a more acceptable TQM<br />
approach. Three OD techniques which could have been applied are<br />
survey feedback, the confrontation meeting (Beckhard, 19671, and<br />
work teams. Advantages to this approach are that problem<br />
identification is based on survey data; top management and work<br />
teams define the problems and propose solutions: middle<br />
management and workers develop the specific TQM procedures; and<br />
the survey data provide a reference point for surveys administered<br />
after changes have been made.<br />
When employed by a skilled facilitator, these techniques increase<br />
the chance of everyone developing a sense of ownership in the<br />
procedures adopted. TQM tools, such as cause and effect diagrams,<br />
461
are used to examine the processes associated with product quality<br />
after the group has accepted the need for change.<br />
LESSONS LEARNED<br />
In December 1989, Clark (1989) reported several lessons learned<br />
during the TQM program at AFHRL. The following is a summary of<br />
additional lessons learned.<br />
Facilitators. --_-----e--- Facilitators must be familiar with TQM quality<br />
improvement procedures and with OD techniques for gaining program<br />
acceptance. Facilitators should also be able to train people in<br />
TQM and OD. It is best to use facilitators who are not a part of<br />
the management group that is initiating TQM. Facilitators need<br />
the independence and authority to run the program as approved.<br />
Organizations sometimes appoint their own facilitator and conduct<br />
a do-it-yourself TQM program. An alternative is to hire a fulltime,<br />
thoroughly trained facilitator from outside the<br />
organization who can offer TQM alternatives. Facilitators should<br />
not impose their own philosophy on an organization or direct a<br />
specific TQM approach. The organization should develop its own TOM<br />
approach based on its unique requirements.<br />
Process ------- Action ------ ----- Teams ----_-* (PATS) The role of PATS in a TQM program is<br />
to examine manufacturing and administrative processes and Improve<br />
the quality of service to the customer. Twenty months into the<br />
TQM program at AFHRL, 380 people were asked: How<br />
valuable are the process action teams at AFHRL? Twenty-one<br />
percent of the 94 people answering said, 'No Value.' Thirty -<br />
five percent said, 'Some Value'; 31% said, 'Moderate Value'; and<br />
13% said, 'Significant Value.' These results were surprising<br />
because, throughout the TQM program, people said the PATS were the<br />
most effective and worthwhile part of the program. We expected<br />
more people to say the PAT's were of significant value.<br />
Written comments from the survey showed that peopie who said<br />
PATS were of no value were either not aware of what the PATS were<br />
doing, felt the PATS created too much bureaucratic busy work, or<br />
thought the PATS were not addressing the right problems.<br />
People who rated them highly said the PATS increased<br />
communications, involved people from lower levels, and proposed<br />
effective solutions to problems.<br />
Most PATS at AFHRL worked on improving administrative procedures.<br />
There was less progress in improving the quality of the laboratory<br />
R&D product and customer satisfaction. PATS should spend a majcr<br />
portion of their time working on product improvement and customer<br />
Satisfaction. Excessive attention to administrative procedures<br />
can be a symptom of undue concern about management and too 11++"tile<br />
concern about customer satisfaction and product quality.<br />
PATS are not the solution to all problems. It is easy t0 defer<br />
decisions to a committee without exercising leadership. L??me
problems sent to/~+ATs could be easily solved by management in half<br />
the time.<br />
Training. Ty>ical TQM training programs consist of lectures on<br />
-m----w<br />
the philosop;lies of Deming, Juran, Crosby, and other well-known<br />
quality advocate 8 . There should be additional training on such<br />
Subject8 aq participative management, customer interface, process<br />
control application8 to non-manufacturing activities, and<br />
statistical analysis.<br />
Process Action Team8 need training on group participation skills,<br />
brainstorming, CaU8e and effect diagrams, and other TQM tools.<br />
Most expert8 orient this training toward8 the task at hand, Tather .,<br />
than toward8 people'8 feeling8 and personalities. Training in OD<br />
intervention techniques comes after management ha8 decided which<br />
OD techniques to apply. Unless people receive specialized<br />
training in OD and TQM, they do not know how to get underway.<br />
Communications. Good communication is fundamental to the 8UCCe88<br />
---------T----<br />
O! any TQM program. Yet, many organization8 have poor<br />
communications. Upward communication is poor because manager8<br />
fail to listen. Downward communication is poor because managers<br />
want to protect their worker8 from what they con8ider to be<br />
irreievant information. The result is a communication gap between<br />
manager8 and workers. Change8 to this pattern can come about by<br />
recognizing the problem and training new behavior8 through<br />
Cla88rOOm di8CU88iOn and leadership example.<br />
Some organization8 increase communications, openne88, and teamwork<br />
by uaing newsletters. AFHRL started a newsletter halfway through<br />
it8 TQM program. The newsletter was distributed each month and<br />
invited everyone'8 participation. Informal conversations<br />
indicated there may have been more TQM di8CU8SiOn8 in the<br />
laboratory because of the newsletter. A newsletter keep8 the<br />
importance of quality and productivity gain8 visible to management<br />
and employees.<br />
Measuring ---a---- Quality ----s e-w and --------M-e Productivity. One way to measure quality and<br />
productivity in an R&D organization is to establish customer<br />
requirements, set goals, and measure progress toward8 reaching<br />
those goal8 in cooperation with the customer. Although this method<br />
i8' more appropriate for applied R&D project8 than for basic<br />
research, it can be used for both. In an R&D organization, there<br />
is usually le8S emphasis on measuring scientific progress through<br />
use of the traditional TQM statistical process control techniques.<br />
Surveys can be used to measure customer satisfaction.<br />
Resistance ------e--m to -- TQM. --- Some people resist any type of organizational<br />
change. They do not want to start a TQM program or any other<br />
program. They jU8t want to be left alone to do their work.<br />
Others fear a 1088 of responsibility while still other8 fear they<br />
may get some. Reactions range from outright argument against TQM<br />
to stonewalling and simply waiting out current management.<br />
463
Management must listen, but also lead. If data show that<br />
organizational problems exist, open discussions should take place:<br />
but it is up to top management to lead the organization. This<br />
does not prevent the use of OD techniques. In fact, the greater<br />
the problem and resistance, the greater is the need for OD. People<br />
become be1 fevers based on the enthusiasm, examples, ideas, and<br />
data presented by management.<br />
Whatever TOM strategy and tactics are adopted, they must be<br />
reviewed and updated at least once each year. This action<br />
accommodates criticism and conveys a sense of continually striving<br />
for improvement and acceptance of the procedures adopted.<br />
Labeis --_--- * Labels, such as MGEEM, TQM, MBO, and Zero Defects, can<br />
easily become scapegoats for people dissatisfied with a new<br />
management initiative. One way around this is to avoid using<br />
labels. The NASA Lewis Research Center, for example, calls its<br />
quality improvement program just that, a quality improvement<br />
program (Office of Management and Budget, 1990). Although NASA<br />
uses Deming principles and the ideas of other TQM experts, they<br />
intentionally avoid referring to their program as a Deming program<br />
or a TQM program. Their program is a combination of quality<br />
initiative3 uniquely patterned for their organization. This may<br />
be a good policy to adopt, since it can be more difficult to argue<br />
against a quality improvement program than a specific TQM program<br />
with a label.<br />
FADS<br />
Many who read this paper will be familiar with the long list of<br />
publications which tell how to improve organizational<br />
productivity, quality, and morale. A particularly good summary of<br />
fads has been published by John Byrne (1986). He tells rn a very<br />
entertaining way how fads come and go, and what are the latest<br />
fads. He says that too many modern managers are like compulsive<br />
dieters: trying the latest craze for a few days, then moving 0 R<br />
(P. 58).<br />
The theme of this paper is that things do not have to be that<br />
way. An initiative to increase productivity and quality can<br />
succeed and endure if people in the organization buy .into it.<br />
First, they have to believe they need a change; then they have %o<br />
agree to participate in the program. Because people are different<br />
and organizations are different, the approach must be tailored tc;<br />
the organization.<br />
Success requires a qualified facilitator or change agent who can<br />
teach people how to work as teams. Additionally, all levels of<br />
management must endorse and actively sponsor the management<br />
change. Workers must have goals which are consistent with t, h e<br />
overall goals of management. OD techniques can help gain 7, h t?<br />
required trust and cooperation needed to sustain a TQM program.<br />
All this takes time, patience, and considerable skif?.<br />
464
If TQM does not work as promised. we may have to admit that<br />
programs which rely on people's good will just won't work. As Ring<br />
Lardner, Jr. (1990) said about Communism in Eastern Europe:<br />
Communism like Christianity is good in theory, but given human<br />
nature, hard to put into practice. Perhaps the same can be said<br />
about TQM.<br />
REFERENCES<br />
Beckhard, R. (1967, March-April). The confrontation meeting.<br />
Harvard Business Review 45<br />
--e-e-- -----w-- --,,-,a --' 149-155.<br />
Byrne, J.A. (1986. January). Business Fads: What's in -and-out? .<br />
Business Week<br />
------m- ----AL' pp.52-58.<br />
Clark, H.J. (1989, December). Total --m-e quality --w-w -e-w management: --a--- an<br />
$Bpllcation in a research and ___--- development ---- -s------- laboratory (AFHRL-TP-85=<br />
58,--AD-Azi5-808>T--BFooks-AFB, TX: Special Projects Office, Air<br />
Force Human Resources Laboratory.<br />
French, W., & Bell, C.H., Jr. (19841. Organization -- ---e---e- development: ------ -----<br />
Behavioral ---------- science --a---- _____-_------ interventions --- for organization ------v-- -- imBrovement ----,-,,A<br />
Englewood Cliffs, NJ Prentice-Hall.<br />
Kanji, GI.K. (1990). Total quality management: the second<br />
industrial revolution. Total quality &n~gement l(l),pp. 3-12.<br />
----- ---a- -,--,1-<br />
Lardner, R., Jr. (1990, March 191. --,: U S News ____ & World _ __ Report ,,-1<br />
Washington, DC. p. 27.<br />
Office Of Management and Budget. (19891. Quality ------ improvement e-w---prototype.<br />
----- Unpublished document available through the GASA Lewis<br />
Research-Center, Cleveland, OH.<br />
Tuttle, T.C., & Weaver, C.N. (1986, November). Methodology _ ____ for<br />
generating --v-e--- efficie_ncy --- and ------------- effectiveness measures -w-w---- (ggE_E@)i.a guide ----<br />
--- for -a- Air ----- Force ----------- measurement -------we--- facilitators (AFHRL TP-86-36, AD-Al74<br />
574). Brooks AFB, TX: Manpower and Personnel Division, Air Force<br />
Human Resources Laboratory.<br />
465
32nd Conference of the Miiitarv TestinP <strong>Association</strong> (MTA:<br />
An officer, a social scientist, (and possibly A gentlematl) in the Royal<br />
Netherlands Army (R!iLA).<br />
Presentation by Co1 Dr. G.J.C. Roozendaal<br />
Head, Behavioural Science Division<br />
Directorate of Personnel RNLA<br />
The Royal Netherlands Army<br />
In peacetime the Royal Netherlands Army has 78,000 employees, consisting of<br />
23,000 regular servicemen, 43,000 conscript personnel and aimost 12,000<br />
civilian employees. The RNLA can rapidly reach its wartime strength of<br />
200,000 men and women by calling up reserve personnel.<br />
The Royal Netherlands Army is a volunteer-conscript army, with national<br />
service (for men only) lasting 14 to 16 months. [!I<br />
Women have the right to serve, but only as volunteers. In principle all<br />
posts are open to women.<br />
The Royal Netherlands Army has its own military psycholcgical and social<br />
service, comprising around 20 regular officers in the ranks from major up to<br />
and including brigadier-general.<br />
All these officers have graduated from a Dutch University in either<br />
psychology or sociology.<br />
Virtually all of these officers were given their basic training at the Royal<br />
<strong>Military</strong> Academy, after which they spent several years in active service as<br />
a platoon commander, company commander and/or a staff ofticer with an ac,tive<br />
unit.<br />
Only then have most officers completed their training as social scientists.<br />
<strong>Military</strong> behavioural scientists occupy various posts in different fieids of<br />
work.<br />
Allow me to give you some examples:<br />
1. The personnel manager of the Directorate of Personnel RNLA is a<br />
brigadier-general psychologist.<br />
2. There are four colonels who act as, amongst others:<br />
- Head of the Behavioural Sciences Division;<br />
- Commander of the Didactics and <strong>Military</strong> Leadership Training Centrt;:<br />
Instructor at the Royal <strong>Military</strong> Academy:<br />
- Head of the Individual Assistance Section.<br />
3. One iieutenant-colonel is Commander of the Seiyc:t.:luz Centre of the<br />
Royal Netherlands Army.<br />
In addition there are another fourteen officers in the ranks of ma:'-jr<br />
and lieutenant-colonel who occupy a wide range o!: !r.+se;irch, policy 22.:<br />
assistance posts.<br />
I shall now endeavour to use some examples to make .ir. :.Lt?ar tc: yell what<br />
exactly these officers do.
Research:<br />
In cooperation with a number of civil institutes, research is conducted<br />
into:<br />
a. Job satisfaction<br />
Every year a sample survey is carried out amongst. 5X of regular<br />
personnel with regard to their well-being,<br />
motivation, their opinion on personnel policy and current matters.<br />
Finally, they are also asked if they contemplate leaving the service.<br />
This year the questions concerned the military personnel’s opinion on<br />
the change in East-West relations, and on the planned reductions in .<br />
personnel.<br />
I am unfamiliar with your research experiences, but we at the RNLA have<br />
noticed that personnel have not lost their motivation for their military<br />
task, although they are uncertain as to whether their jobs will continue<br />
to exist.. However, they are mainly of the opinion that the reductions<br />
will not affect them personally, but rather a colleague elsewhere.<br />
This affects the advisory policy to be pursued with regard to reductions<br />
in personnel.<br />
b. Exit interviews<br />
Exit interviews are held with all personnel leaving the Royal Netherlands<br />
Army prematurely. This yields information for the organisation as<br />
to how it is valued as an employer and how personnel policy can best be<br />
altered.<br />
In this context, extra measures have been taken in order to increase<br />
the ability to retain technical personnel, doctors and other highlytrained<br />
personnel.<br />
The exit interviews also provide information enabling the policy to<br />
integrate women and ethnic groups to be adapted accordingly.<br />
Remarkably enough, the exit interview is now needed in order to<br />
determine which measures can be taken to achieve increased voluntary<br />
outflow in the light of the reductions in personnel - an interesting<br />
change in the use of exit interviews.<br />
C. Violence in the armed forces<br />
Research was conducted recently into the occurence of violent incidents<br />
within the armed forces, covering all types of physical and/or mental<br />
violence ranging from coarse language, swearing, harassment and physical<br />
violence, right up to forms of sexual violence.<br />
The research showed that, fortunately, forms of serious sexual abuse<br />
are fairly rare.<br />
However, the research did lead to a series of recommendations<br />
as to how leadership qualities can be improved.<br />
These recommendations are now being implemented.<br />
d. Homosexuality research<br />
Following on from the above, research is currently being conduc:t.ed into<br />
the extent to which homosexual soldiers experience forms of discrimination.<br />
A sample survey is also being conducted amongst all soldiers with<br />
regard to their attitude towards homosexual colleagues. This survey is<br />
still in its initial stages, but it will certainly enjoy particularly<br />
strong political interest.<br />
467
e. Ceployability research<br />
On several occasions my division has conducted research into the effect<br />
of lengthy exercises on the deployability of regular and conscript<br />
military personnel.<br />
This research has resulted in the adjustment af the operational plans<br />
formulated by the Netherlands Army Staff.<br />
f. Miscellaneous<br />
Without entering into any detail, I shall just mention research which<br />
my division has conducted regarding: the use of alcohol, drugs and<br />
gambling addiction, the integration of women in the RNLA, unit consultations,<br />
conscript NCCs, and so on.<br />
Selection of personnel<br />
a. My division recently developed and implemented a personality test with<br />
a view to selecting prospective conscript personnel on their suitability<br />
for compulsory national service. This can limit the number of<br />
conscripts who dysfunction for psychological reasons.<br />
b. A procedure has been developed for prospective regular officers and<br />
NCOs to determine more accurately their suitability for the Royal<br />
Netherlands Army. This procedure uses personality studies and biographical<br />
data, compiled through standard interviews.<br />
All interviewers have been trained at length (apprcximately 2 year-s).<br />
Validation studies have shown that a well-trained<br />
interviewer is far more capable of selecting the right man or woman for<br />
the army than is a study of hisjher abilities.<br />
C. A computerised lo-task test is currently being developed in cooperatior.<br />
with the Institute for Perception (TNO).<br />
The test is designed to gauge both stress tolerance and capacity for<br />
multiple information processing. For validation purposes, these computer<br />
tests are now included in the existing test procedures.<br />
d. In a while I shall discuss the research we have conducted in order tro(:ey
. Our research into battlefield conduct has led to techniques now being<br />
introduced, which can reduce the effects of battle stress.<br />
The techniques are applied at individual level in activities which are<br />
by their very nature stressful, such as parachuting, rock climbing or<br />
diving.<br />
Furthermore, commanders are thoroughly prepared for the effects of<br />
stress and battle stress, and they are taught how to recognise stress<br />
symptoms and how to act when faced with them.<br />
C. We have conducted intensive research into the effects of lack of sleep<br />
over a long period.<br />
Just one of the things this has revealed is how lack of sleep influen-<br />
ces leadership.<br />
After 48 hours' sleep deprivation the effectiveness of decisions taken<br />
declines dramatically. The factor causing the most concern is that<br />
commanders often do not realise, or realise only vaguely, that they are<br />
no longer capable of making responsible decisions.<br />
Symptoms of this kind have given rise to a great deal of attention<br />
being paid to the aspect of sleep management. A remarkable fact is that<br />
many commanders reject the<br />
implementation of sleep management, as they deem it<br />
un-military.<br />
However, we shall persevere.<br />
d. The RNLA has paid too little attention to Psychological Operations<br />
(Psycops) for too long. In fact, until recently the subject was not<br />
open to discussion on a political level, even in today’s free society.<br />
Psychological defence (preparing oneself for the adversary’s psychological<br />
operations) was all that was politically acceptable at that time.<br />
Recently however, the subject of Psycops has been attracting more<br />
attention, something which has been partly influenced by the attention<br />
we have paid to the effects of battle stress.<br />
For us this constitutes an interesting topic for research; one about<br />
which we think we can learn a lot from others.<br />
e. The attention given to battlefield behaviour, which I have already<br />
mentioned, has led to an entirely different structure of the treatment<br />
of combat stress victims in actual wartime conditions. Based on the<br />
well-known principles for treating combat stress victims,<br />
proximity (treatment at the front)<br />
- immediacy (treatment as soon as the symptoms occur and as quickly as<br />
possible)<br />
- expectancy (treatment to return the victims to active service). we<br />
formed a system of combat stress recovery units for our 1 (NL) Army<br />
Corps.<br />
The combat stress recovery units are located in the rear areas of the<br />
brigades and must be able to operate as mobile units.<br />
The battalion aid post could serve as a collection point for battle<br />
stress victims; this is where triage takes place.<br />
After triage the battle stress victims are treated in the battle stress<br />
recovery unit. The head of the battle stress recovery unit will be an<br />
officer from our psychological and social service, one of our trained<br />
psychotherapists in fact.<br />
469<br />
.
--- ---<br />
We estimate that 25% of all victims will be battle stres,s victims.'Our /'.<br />
objective is to return 50% of them to active duty withintwo days, and<br />
ultimately 80% within seven days.<br />
The heads of the battle stress recovery units will also,,be able to act<br />
as staff officers to the brigade commander.<br />
Their responsibilities will include the prevention ::f stress-related<br />
problems.<br />
Our battle stress recovery organisation is not yet ready, but we aim :o<br />
have it operational by 1991.<br />
In order to prevent symptoms of FTSD (post-traumatic stress disorder),<br />
analysis is now taking place in order to determine whether, in the<br />
event of the RNLA being deployed for peace-keeping operations, the<br />
assignment of military psychologists at a lower level would be worthwhile.<br />
Assessments<br />
f. Earlier t'nis year we introduced a new procedure for assessing regular<br />
personnel. In this system, more emphasis is laid on the influence of<br />
assessments on the management development system. In order to exclude<br />
undesired effects such as the unequal distribution of power, stereotyping<br />
and so on, every battalion now has a so-called assessment advisor.<br />
We have thoroughly trained these advisors, who are intended to support<br />
both the commanders and the individual soldiers to be assessed.<br />
This system was implemented at the beginning of this year.<br />
Evaluations to see whether the c1)jec:tive has been met will take plac:e<br />
in 1991.<br />
Education and training<br />
a. In 1985 the Didactics and <strong>Military</strong> Leadership Training Centre was<br />
established for the RNLA. One of my colleagues, a colonel.and a<br />
sociologist, is the commander of this centre.<br />
All future military instructors are trained at the centre, as are the<br />
so-called didactics specialists who are appointed to ever'y training<br />
centre.<br />
b. The centre also offers possibilities for leadership training to military<br />
commanders of all levels.<br />
Team-building procedures are developed and distributed from this<br />
training centre.<br />
c . Finally, the training csntre plays a leading part in the use of new<br />
teaching methods such as computer-based instruction, simulators, wargames,<br />
the training of social skills, and so forth.<br />
Care of dysfunctioninK oersonnel<br />
a. One of my colleagues, a colonel and a clinical psychologist/<br />
psychotherapist, is head of our Psychological Support Section.<br />
His section comprises three offices; the head of eac:r? of these offices<br />
is a military psychologist.<br />
470
. Every year these offices give psychological (psychotherapeutic)<br />
aid to some 5,000 soldiers. Sometimes the problems are<br />
simple, and can be solved simply by advising a transfer. Sometimes,<br />
however, the problems are related to psychologically more complex<br />
matters involving alcohol, drug and gambling addiction, family problems<br />
and individual dysfunctioning. A number of these problems stems from<br />
traumatic service experiences, accidents, shooting incidents and the<br />
delayed effects of PTSD or battle stress.<br />
C. By virtue of this very experience, the psychologists from these offices<br />
are exceptionally well deployable in the battle stress recovery units I<br />
described earlier.<br />
The integration of women in the RNLA<br />
a. As most of you are probably aware, all posts in the Netherlands armed<br />
forces have in principle been accessible to women since 1978 (including<br />
combat duties).<br />
I need hardly remind you that this does not mean there are no problems<br />
involved with the integration - on the contrary, there are.<br />
b. Some of these problems are obviously caused by the<br />
differences in physical strength and [powers of?] endurance between men<br />
and women.<br />
These facts are generally accepted, and are therefore open to discussion.<br />
Moreover, as is the case in the Royal Netherlands Army, effective<br />
policy measures can be<br />
implemented to cope with these differences.<br />
Allow me to give you an illustration. All military posts are classified<br />
according to physical demand.<br />
A method has been developed which measures physical strength in men and<br />
women. Because the job requirements for men and women are the same in<br />
principle, this results in there being relatively few women in physically<br />
demanding posts.<br />
C. One particular problem is that few women are prepared to enter into<br />
long-term contracts. This was one of the reasons for our making all<br />
conscript posts - in principle - accessible to women.<br />
For many this was a way of getting to know the RNLA. For some this has<br />
already led to a job with the regular personnel.<br />
d. Measures have been implemented which until recently were highly<br />
controversial: parental leave has been introduced (for both men and<br />
women), day-care centres have been set up, part-time work has been<br />
introduced for soldiers, and women now have the opportunity (after a<br />
certain time) to return to the RNLA in order to resume a military<br />
career interrupted by parental duties.<br />
Officers from the psychological and social service have played a<br />
significant role in the implementation of all these measures.<br />
I myself have been very closely involved, as I am chairman of the<br />
working group responsible for the preparations for the integration of<br />
women (and also that of ethnic groups).<br />
471
e. Furthermore, I have been c:ommander of t.he RNLA’s seiection centre for<br />
three years - I shall tell. you more a!)out. this an? ;he relevance it<br />
bears to this conference.<br />
As you are aware, psychological tests show. on average, differences ill<br />
scores between men and women.<br />
In 1985 the tests gave the following differences in percent.ages of men<br />
and women passing final selection:<br />
men :<br />
women :<br />
40%<br />
20%<br />
Further research revealed that these differences in percentages were<br />
brought about mainly by the original version of our practical technical<br />
ability test.<br />
A mere 25% of all women came above the cut.-off score, as opposed TV ‘5%<br />
of all men.<br />
f. This was one of the reasons behind our subjecting these test to a<br />
thorough item-bias study, which led to t.he test being drastically<br />
adapted.<br />
Women still score iower results than men in this test, but the differences<br />
are now much smaller: 50% of women and 75% of men now meet the<br />
demands set in these test.<br />
We have also modified the procedures.<br />
Compensation and more differentiation a,r>cording to position have how<br />
been introduced, and we have adapted our<br />
personality test.s and the interview.<br />
Without. going into too much detail, I can also tell you that we<br />
currently set the same demands 2nd the same tests for men and women,<br />
with the same cut-off scores,<br />
answers.<br />
based on the same number of correct<br />
Moreover, the numbers of men and women in percentages passing final<br />
selection are now almost equal, as . the following illustrates:<br />
men : 45%<br />
women : 40%<br />
g* This is the only way in which we can achieve our objective of -women<br />
comprising 10% of the army by 1993. You will<br />
understand that this is no simple task for an organisation in which t.he<br />
same demands are set. for both men and women, nor for one which is t.i! be<br />
reduced by 30% over the next f*w years.<br />
Reductions in personnel<br />
This brings me to the final point I wish to bring to your attention. Many<br />
armed forces will have to make considerable reduct.i.ons over the next few<br />
years; as I have aiready merit.ioned, this will amount to 30% for the RNL4.<br />
4 ‘7 2
The question now is how to approach this, and how to help in one’s capacity<br />
as a military psychologist.<br />
Allow me to give you some examples of our contribution in this matter:<br />
Exit interviews: determining the ways of promoting the voluntary outflow<br />
of personnel.<br />
Out-placement: assisting in the search for a job outside the army.<br />
Information: advising on the policy to be pursued and its psychological<br />
consequences for personnel.<br />
Individual in cases where enforced discharge is<br />
assistance8 unavoidable (etc. 1.<br />
This is perhaps a rather gloomy note on which to finish, but it nevertheless<br />
illustrates how valuable such a widely-deployable military psychological and<br />
social service can be.<br />
473
Acceptance of Change<br />
An Empirical Test of a Causal hlodel<br />
Edith Lynne Goldberg<br />
State University of New York, Albany<br />
John P. Sheposh<br />
Joyce Shettel-Neuber<br />
Navy Personnel Research and Development Center, San Diego. CA<br />
Abstract<br />
This study examined the effect of climate in combination with other factors on<br />
perceived value and acceptance of changes in three public sector (Department of<br />
Defense) organizations that had adopted new approaches to managing human<br />
resources. A conceptual model was proposed and tested to convey the interactive<br />
nature of the set of factors selected as important to acceptance of the changes.<br />
In general the hypothesized ,interrelationships were supported by the data. The<br />
assessment of the specific changes during the period of implementation was<br />
influenced by organizational contextual factors (CLIMATE). The assessment of the<br />
specific changes, in turn, affected perceived consequences of the changes which<br />
influenced the desire to retain the changes. This last factor, which could be<br />
construed as intentionality, is considered an important underpinning or precursor<br />
to the final stage of institutionalization. The combination of predictors in the<br />
model accounted for 56% of the variance. Theoretical and applied issues were<br />
discussed and future research suggested.<br />
In response to need or opportunity, organizations put into place planned changes that alter<br />
or replace existing procedures, products, processes, and/or policies. The implementation phase-<br />
-what happens after a decision has been made to adopt the change--is a critical period in the<br />
success of the change. The research that has been accumulated on implementation has reported<br />
numerous failures (Bardach, !977; Schultz & Slevin, 1975), therefore, research which could !ead<br />
to the identification, examination, and better understanding of factors important to the<br />
successful implementation of organizational change is needed.<br />
A wide range of factors could plausibly influence the implementation of a change in an<br />
organization. Certain factors have been identified by most experts on the subject as playing a<br />
significant role in the adoption and implementation of change (cf. Sheposh, Hulton, 22 Knudsen,<br />
1983). One of the major factors that has been cited as influencing implementation and<br />
institutionalization of change is the organizational climate of the adopting unit (Glaser, 19721.<br />
In general, climate has been regarded as a perception of the organization by its employees which<br />
is shaped by experiences within the organization. Climate is viewed as influencing the behavior<br />
of organizational members, distinguishing one organization from another, and enduring 01.e:<br />
time (Gordon & Cummins, 1979; James Sr Jones, 1979; Schneider & Snyder, 1975).<br />
This paper reports on research which examined the effect of climate in combination u-irh<br />
other factors on perceived value and acceptance of changes in three public sector (Departmecr<br />
of Defense) organizations that had adopted new approaches to managing human resources. *2<br />
conceptual model was proposed and tested to convey the interactive nature of the set of faciors<br />
that were selected as important to acceptance of the changes. The model. the variable<<br />
comprising the model, and the proposed causal linkages are presented in Figure 1.<br />
The opinions expressed in this paper are those of the authors, are not ~t‘fic~a!. .y. p. :J ;j: .; ‘: :<br />
necessarily reflect the vie\vs of the Navy Department.<br />
-m- .-.- . .<br />
-
Level<br />
c Consequences<br />
of Changes<br />
Fimre 1. Proposed model of acceptance of institutionalizalion of change.<br />
1 Acceptance of<br />
b Institutionalization<br />
of Change<br />
According to the model, LEVEL in the organization (i.e., first-line supervisors, managers)<br />
and CLIMATE represent exogenous variables. LEVEL was included because descrjptions of .<br />
organizational climate differ among hierarchical levels within an organization. Payne and<br />
Mansfield (1973), for example, reported that those individuals who were higher on the<br />
organizational hierarchy tended to perceive their organization as more democratic, friendly, and<br />
ready to innovate than those who were lower. As conveyed in the model, CLIMATE has a direct<br />
influence on specific aspects of the three changes that were being implemented. Organizational<br />
climate was expected to affect the extent to which specific changes produce benefits, because a<br />
change is more likely to succeed in an organization where the climate is open, accommodating<br />
to change, and in general positive. The combined effects of the specific changes in turn should<br />
significantly affect (increase or decrease) managers’ and supervisors’ ability to manage<br />
personnel-related matters in their work (CONSEQUENCES OF CHANGES). These perceived effects<br />
were expected to have a direct bearing on their willingness to institutionalize the particular set<br />
of changes that were being implemented (ACCEPTANCE OF INSTITUTIONALIZATION OF CHANGE).<br />
This index was included because previous research (Berman & McLaughlin, 1978) suggested that<br />
the question of institutionalization of a change is distinctly separate from that of<br />
implementation. Berman and McLaughlin concluded that initial adoption of a change does not<br />
ensure implementation nor does successful implementation necessarily ensure continuation of the<br />
change. It was hypothesized that in this study the perceived success of the changes during the<br />
implementation phase, as gauged by assessment of specific aspects of the changes (SPECIFIC<br />
CHANGES), and the perceived consequences (CONSEQUENCES OF CHANGES), which are to some<br />
extent determined by the climate of the organization will tend to produce broad based support,<br />
which would be instrumental in promoting the continuation of the changes (ACCEPTANCE OF<br />
INSTITUTIONALIZATION OF CHANGE). .<br />
Method and Procedures<br />
Oreanizations<br />
Three Department of Defense (DOD) organizations, which provide logistical support for the<br />
armed services, served as research sites. Their functions include storing, shipping, and issuing<br />
materials and monitoring contracts with private sector businesses. They are are staffed by civil<br />
service employees and a few military officers in top management positions.<br />
Subiects<br />
The data in this study were based on the questionnaire responses of a random sample of<br />
211 supervisors and managers from first-line level and above.<br />
Innovations<br />
As part of a 3-year experiment designed to improve human resource management, n<br />
package of three changes was proposed and implemented at each of the three sites. One change<br />
involved the Delegation of Classification Authority to line management, allowing those most<br />
familiar with positions under them to assign series and grades to jobs rather than having<br />
personnelists do so. The second change, Nonpunitive Discipline, was established to substitu:c<br />
letters of warning for reprimands and short suspensions. The initiative was intended to imprc:r:e<br />
475
supervisor-subordinate relations, make employees take responsibility for correcting problem<br />
behavior, and save money and productivity lost to suspensions. The third initiative, the<br />
Elimination of Mandatory Interviews, removed an agency requirement that all candidates for a<br />
job be interviewed and allowed appointing officials to interview some, all, or none of the<br />
candidates for a position after reviewing their written applications<br />
Materials<br />
A questionnaire, developed to measure respondents’ perceptions of climate and the specific<br />
changes. was administered one year after program implementation began. The first part of the<br />
instrument included questions regarding demographic characteristics of the respondents and<br />
perceptions of organizational climate. Organizational climate was adapted from several<br />
questionnaires (Gordon & Cummins, 1979; Siegel & Kaemmerer, 1978; Mowday & Steers, 1979;<br />
and Young, Riedel, & Sheposh, 1979). It consisted of 47 items which represented nine _<br />
organizational dimensions (e.g. organizational climate, management style, organizationsl<br />
effectiveness). Seven point scales were used for all dimensions except organizational<br />
effectiveness which was measured on a nine point scale.<br />
The second half of the survey assessed the specific. changes and related issues. Three<br />
aspects of the changes were addressed. First, a set of items using 7 point scales was developed<br />
to assess the specific initiatives. For example, the ease, efficiency, and fairness of the<br />
Elimination of Mandatory Interviews initiative was measured by three items employing 7-point<br />
response scales. Second, perceived consequences resulting from the specific changes (e.g.,<br />
augmented authority and increased ease in carrying out personnel actions) were measured with 7<br />
point scales. Third, the general acceptance of the initiatives, preference for these changes over<br />
the old system, and the extent to which respondents wanted the changes to continue were<br />
assessed with three items employing 5 point scales.<br />
Results<br />
The mean responses for the components comprising the model for first-line supervisors and<br />
managers and for the overall sample are presented in Table 1. In general managers gave a<br />
slightly more positive assessment of the organization’s climate, the individual changes, the<br />
perceived consequences, and the acceptance of the institutionalization of the changes.<br />
Significant differences between supervisors’ and managers’ ratings were obtained for<br />
ELIMINATION OF MANDATORY INTERVIEWS (Fl ]gg = 7.12, p K .Ol) and for ACCEPTANCE OF THE<br />
INSTITUTIONALIZATION OF THE CHANGES (Fl ]g; = 13.11 p
As shown in Table 1 the supervisors and managers assessed each of the specific changes<br />
favorably. For example, they agreed that the elimination Of mandatory interviews is easy to<br />
carry out, results in fair selection of candidates, and results in positions being rapidly filled.<br />
Similarly, the supervisors and managers perceived benefits resulting from the combined changes<br />
(e.g., perceived increases in their authority to influence classification decisions, the overall<br />
productivity of the work teams). They did not perceive differences with respect to meeting job<br />
responsibilities or filling positions as a result of the inception of these changes. Finally,<br />
concerning the institutionalization of the changes, managers and supervisors are positive about<br />
these changes, prefer the new system over the old, and would like to see the changes continued<br />
in their work setting.<br />
I<br />
B1.12’<br />
I<br />
Blmlmtlon ot<br />
I<br />
M8nd8lofy hl-<br />
,<br />
I<br />
- - - -.--,-------------------l9*.11*<br />
(rJq<br />
Fieure 2. Model of acceptance based on path analyses.<br />
A path analysis was applied to determine the correspondence between the data and the<br />
proposed model as described in Figure 1. The results of the path analysis are presented in<br />
Figure 2 and the correlation matrix underlying the analysis are presented in Table 2. The<br />
ordering of the variables and their interrelationships as presented in Figure 2 generally<br />
correspond to the structure of the proposed model. As hypothesized the LEVEL variable is most<br />
strongly related to CLIMATE which in turn directly influences the three changes. Assessment of<br />
the three changes has a significant relationship on CONSEQUENCES BENEFITS but not on<br />
CONSEQUENCES EFFORT. The differentiation of consequences into two types was made on the<br />
basis of a factor analysis which generated two independent factors. Both sets of consequences<br />
are significantly related to ACCEPTANCE OF THE INSTITUTIONALIZATION OF THE CHANGES with the<br />
CONSEQUENCES BENEFITS clearly the strongest predictor. The model accounted for a good<br />
amount of the total variance (R2 =.56). In addition to the absence of a significant relationship<br />
between the specific changes and the CONSEQUENCES EFFORT variable, the ordering of effects<br />
that were obtained for ELIMINATION OF MANDATORY INTERVIEWS departed from the proposed<br />
model. The direct relationship between this change and acceptance was stronger than the<br />
relationship of this variable to consequences.<br />
477<br />
I
I_- __-- ____-._ -.- -- -.. .-.-----. -..-- --<br />
.<br />
1.<br />
2.<br />
3.<br />
4.<br />
5.<br />
6.<br />
7.<br />
8.<br />
-8 .r:: .‘. a\ p-;:::<br />
.; ,. .d, :’ a’<br />
P<br />
Table 2 4 _<br />
Zero-Order Correlations for Model Components<br />
*. .’<br />
. ;.<br />
L<br />
*.<br />
I.<br />
$$f..<br />
I.<br />
: :.<br />
.A, :.-<br />
Level<br />
Climate<br />
Delegation of Classification Authority<br />
Letters of Warning<br />
Elimination of Mandatory Interviews<br />
Consequences (Effort)<br />
Consequences (Benefits)<br />
Acceptance for Innovation<br />
1<br />
.20*<br />
.oo<br />
.08<br />
.14*<br />
.oo<br />
.09<br />
.20*<br />
2<br />
.28*<br />
.32*<br />
.15*<br />
-.o 1<br />
.30*<br />
.2”* 3<br />
3<br />
.31*<br />
.29*<br />
.oo<br />
.50*<br />
.44*<br />
4 .I 5<br />
4<br />
.27*.+<br />
.Ol ;:,:<br />
.37*- ’ .32*<br />
.44* .55*<br />
,,. 6<br />
-08<br />
.1.1<br />
.20*<br />
.a.%<br />
7 , ,_ i;; J.,;, g &<br />
. $ .y 3;.<br />
, ‘;<br />
Regression anaiyses were replicated by employing LISREL (Jorestog & Sordom, 1984):,. :<br />
This approach uses equations with more explicit specifications and simultaneous ,estim$es;.,$ , .hypothesized<br />
underlying relationships and unexplained variance. LISREL provides a more, 2<br />
holistic approach in comparison to separate regression anaiyses (Bagozzii& Phillips, ;1982). ‘and’ ., “yi Y’ ;,.,<br />
served to test the goodness-of-fit of the model in this study. The variablg; 'LEVEL, did not meet:: .y, ,_<br />
the specifications of the model and could not be entered as a component.‘- The mo:!el yielded h ~,~;;t,‘?’<br />
goodness-of-fit (GFI) measure of .94 with an adjusted GFI (AGFI) of $3, and a root mean .y*,, ,?t,<br />
square residual (RMSR) equal to .15. This model appears to be a very reasonable explanation of;;the<br />
relationships between these variables and their ability to predict acceptance for innovation. : .‘I .i<br />
WhiIe the data were generally consistent with the model there were some discrepancies. ’ “’<br />
Contrary to expectations, all these changes were found to exert a direct effect on,KCEPTANcE<br />
as well as having a direct effect on CONSEQUENCES. It appears that the changes and ,<br />
CONSEQUENCES are neither empirically distinct nor do they seem to function in an exactly ‘,<br />
similar fashion. The other not readily explainable departure from the l3roposed model is the<br />
lack of significant relationships between CONSEQUENCES (EFFORT) and the, factors hypothesized<br />
as the determinants of this factor.<br />
Conclusions<br />
; :<br />
The present research proposed and tested a model that inc&porated ,‘.cornponents~.;:~~* ‘(<br />
hypothesized as relevant to the assessment of the status of a set of changes being impleniente@:. +.3,+~ w:~-<br />
In general the hypothesized interrelationships were supported by the dat& The assessm&t<br />
.<br />
;f<br />
. . . . ( .<br />
yi.,? * .I L<br />
the specific changes during the period of implementation was influenced by organizational ‘,’ ;?<br />
contextual factors (CLIMATE). The assessment of the specific changps, in turn, affected . .,<br />
perceived consequences of the changes which influenced the desire to retain the changes. This<br />
last factor could be construed as intentionality, an important underpinning of or precursor to<br />
the final stage of institutionalization.<br />
56% of the variance.<br />
The combination of predictors in the model accounted for<br />
Several conclusions are evident from results based on using regression and structural’ ,,:e<br />
equations. Consistent with past research (Glaser, 1973), hierarchical level and organizational ,I’ “.‘;.<br />
climate were found to be important factors for predicting acceptance of change, but, as the _<br />
present results suggest, they operate as indirect rather than direct predictors. The pattern of.<br />
results, thus, suggest that simple bivariate correlations cannot adequately capture the CLIMATE -<br />
,<br />
ACCEPTACE OF CHANGE relationship. In addition the present model suggests that a combination 4a<br />
of general information about the organization (e.g. LEVEL, CLIMATE) and more specifi?<br />
information about outcomes brought about by the changes are necessary to better understand..<br />
and assess the status of the changes under study and to better predict their future acceptance.<br />
*<br />
p < .05, N = 211<br />
_
-----__<br />
The results have several implications from an applied perspective. First, the use of<br />
measures assessing aspects of the organizational context and its relationship to the perceived<br />
value of the changes underscores the necessity to consider not only the specific features of the<br />
changes but also how the organization operates and functions in the cultivation and promotion<br />
of the changes. Second, the measurement of the changes in terms of their ability to produce<br />
certain expected outcomes is useful in determining the extent to which the changes are<br />
operating as intended. This information can be helpful particularly in the formative stage of an<br />
evaluation when providing feedback to those implementing the changes. Third, as the<br />
implementation process continues and evolves over time the predictive ability of the model can<br />
be determined. To the extent this model successfully predicts the status of the changes it can<br />
then be used during implementation of other changes that are introduced into an organization.<br />
In summary the proposed model, comprised of variables selected on the basis of theoretical<br />
considerations as well as the nature of the changes that were introduced and implem,ented, has .<br />
shown promise as a framework for predicting and understanding the implementation and<br />
acceptance of change in an organizational setting. There are recognized limitations in this<br />
study. It is clear additional research is required. Continued assessment of the changes over<br />
time is needed to ascertain the predictive effectiveness of the model. Additional testing of the<br />
model on other types of changes and in other organizations is called for in order to determine<br />
its effectiveness.<br />
REFERENCES<br />
Bagozzi, R.P. & Phillips, L.W. (1982). Representing and testing organizational theories: A<br />
holistic construct. Administrative Science Ouarterly, =,459-489.<br />
Bardach, E. (1977). The implementation game. Cambridge, MA: MIT Press.<br />
Berman, P., & McLaughlin, M.W. (1978, May). Federal programs sunporting educational<br />
change. Vol VIII: Implementinn and sustaining innovations. Santa Monica, CA: Rand.<br />
Glaser, E.M. (1973). Knowledge transfer and institutional change. Professional Psycholoey, 4.<br />
434-444.<br />
Gordon, G.G., & Cummins, W. (1979). Managing management climate. Lexington&l,%<br />
Lexington Books.<br />
James, L.R., & Jones, A.P. (1979, April). Perceived iob characteristics and iob satisfaction: An<br />
examination of reciprocal causation (Report No.79-5). Fort Worth, Texas: Texas<br />
Christian University, Institute of Behavioral Research.<br />
Joreskog, K.G., & Sorbom, D. (1984). LTSREL VI: Analysis of linear structural relationshius b\.<br />
the method of maximum likelihood. Chicago: National Educational Resources.<br />
Mowday, R.T., Steers, R.M., & Porter, L.W. (1979). The measurement of organizational<br />
commitment. Journal of Vocational Behavior, 14, 224-247.<br />
Payne, R.L., & Mansfield, R. (1973). Relationship of perceptions of organizational climate to<br />
organizational structure, context, and hierarchical position. Administrative Science<br />
Ouarterlv. 18. 515-526.<br />
Schneider, B., & Synder, R.A. (1975). Some relationships between job satisfaction and<br />
organizational climate. Journal of Aoolied PsvcholoPv, a(3), 318-328.<br />
Schultz, R.L., & Slevin, D.P. (1976). Implementation and organizational validity: An empirical<br />
investigation. In R.H. Kilman, L.R. Pondy, & D.P. Slevin (Eds.), Management of<br />
orpanization design. New York: Elsevier North-Holland.<br />
Siegel, S.M., & Kaemmerer, W.F. (1978). Measuring the perceived support for innovation in<br />
organizations. Journal of ADDlied Psychology, a(5), 553-562.<br />
Sheposh, J.P., Hulton, V.N., & Knudsen, G.A. (1983, February). Imolementation of olanned<br />
change: A review of major issues. (NPRDC TR 83-7). San Diego, CA: Navy Personnel<br />
Research and Development Center.<br />
Young, LE., Riedel, J.A., & Sheposh, J.P. (1979). Relationship between Dercwtions of role<br />
stress and individual, organizational. and environmental variables _-- (NPRDC TR 80-8‘:.<br />
San Diego, CA: Navy Personnel Research and Development Center.<br />
479
TWEEDDALE, J. W. (Chair), Chief of Naval Education and Training,<br />
Pensacola, FL.<br />
Annually, approximately 40,000 prospective college students request<br />
information on the NROTC scholarship program. About 12,000<br />
individuals apply and become finalists for NROTC scholarships.<br />
Four-year scholarships are ultimately awarded to approximately 1,500<br />
of the applicants. The scholarship pays for tuition, textbooks,<br />
instructional fees, and summer training periods, as well as provides<br />
the selectee with $100 per month (for a maximum of 40 months).<br />
Selectees may become a member of any of the 66 NROTC units that<br />
service over 120 colleges and universities located nationwide.<br />
The presentations in this symposium describe the procedures used to<br />
select NROTC scholarship recipients. CDR Bob Hawkins of the Na+al<br />
Education and Training Program Management Support Activity will<br />
present an overview of the NROTC selection process. Jack Edwards of<br />
Navy Personnel Research and Development Center will present a paper<br />
that was coauthored with Regina Burch (Colorado State University) and<br />
Norman Abrahams (Personnel Decisions Research Institutes, Inc.). He<br />
will review the steps used to revise the NROTC selection composite.<br />
Third, Wally Borman from the University of South Florida will discuss<br />
a recently developed, behaviorally anchored selection interview and a<br />
newly constructed biodata instrument. Finally, I will highlight the<br />
current and future research objectives for the NROTC scholarship<br />
selection system.<br />
TWEEDDALE, J. W.,<br />
Pensacola, FL<br />
Chief of Naval Education and Training,<br />
Improved procedures for the selection of future officers is<br />
complicated by the longitudinal nature of the research. For<br />
example, if the criterion is whether an individual will remain<br />
following completion of obligated duty, it may take 8 to 10<br />
years for the criterion data to become mature. Also, the<br />
divergent criteria (college grade point average, grade point<br />
average in naval science courses, and military performance<br />
while in NROTC and later in the Navy) used to assess the<br />
accuracy of the NROTC scholarship selection system may present<br />
problems.<br />
The need to monitor the validity of the current predictors and<br />
develop new predictor and criterion measures are but two of the<br />
research needs currently confronting NROTC researchers. In<br />
addition to capturing readily quantifiable information efforts<br />
have been put forth to capture various other characteristics of<br />
"the whole person." Now, researchers, CNET staff, and<br />
Professors of Naval Science are examining ways to operationalize,<br />
measure, and validate those characteristics. The<br />
whole-person model will continue to guide NROTC scholarship<br />
selection research in this time of change for the Navy.<br />
480
a-<br />
GATHERING AND USING NAVAL RESERVE OFFICERS TRAINING CORPS<br />
SCHOLARSHIP INFORMATION<br />
Robert B. Hawkins, Commander, U.S. Navy<br />
Naval Education and Training<br />
Program Management Support Activity<br />
Naval Air Station, Pensacola, FL<br />
Introduction<br />
The responsibility for identifying potential Naval Reserve<br />
Officers Training Corps (NROTC) Scholarship applicants,<br />
processing applications, identifying scholarship winners, and<br />
then placing those selected at one of the 66 host or the more<br />
than 100 associated crosstown affiliated NROTC universities is<br />
divided between two separate Navy commands: the Commander,<br />
Navy Recruiting Command (CNRC) and the Chief of Naval Education<br />
and Training (CNET).<br />
Until the 1986/87 NROTC scholarship year, CNRC identified<br />
applicants, processed applications, and selected NROTC<br />
Scholarship winners. Scholarship winners were identified<br />
during two week-long selection board sessions, an early<br />
selection board held in November, and a second board in<br />
February. CNET then took the administrative action required to<br />
determine final program eligibility (physical qualification)<br />
and provided the authorization for a selectee to attend an<br />
NROTC university under scholarship.<br />
Responsibility for selecting NROTC Scholarship recipients was<br />
transferred from CNRC to CNET after the November 1986 early<br />
selection board. CNET then instituted weekly selection boards<br />
to replace the standard two-selection-board process. Selection<br />
board membership remained essentially the same, with selection<br />
board members drawn from NROTC units (commanding officers) and<br />
the NROTC staff, However, unlike the two-selection-board<br />
system where the same board members evaluated all applicants<br />
during a single session, a weekly selection board process<br />
required the use of different selection board members for each<br />
selection board. Thus, concerns about scoring consistency and<br />
the equity of evaluation had to be addressed.<br />
Application Solicitation<br />
CNRC begins the applicant identification process in March each<br />
year. The primary target market is the high school junior<br />
(rising senior) class. Potential applicants are identified<br />
through a variety of means, but primarily by screening the<br />
Preliminary Scholastic Aptitude Test (PSAT) and Armed Services<br />
Vocational Aptitude Battery (ASVAB) high scorers lists.<br />
Additionally, numerous high school and college fair presentations<br />
are made to generate interest among the college bound<br />
high school student population.<br />
481
____ _.... ___----- .._._ - ___.I.-.__ _.. ~_<br />
A student applying for an NROTC Scholarship completes an<br />
initial applicant questionnaire which establishes his or her<br />
interest. The data supplied on the applicant questionnaire is<br />
used to create a file for the student in the NROTC data base.<br />
The student must then take either the Scholastic Aptitude Test<br />
(SAT) or American College Test (ACT) and request that his or<br />
her scores be released to the NROTC Scholarship Program.<br />
ACT and SAT test data for those students who authorize score<br />
release to the NROTC Program are periodically received by CNRC<br />
and matched with the NROTC interested student file. Those<br />
meeting minimum eligibility scores (presently 450 verbal and<br />
500 math for SAT; 21 English and 23 math for ACT) are invited<br />
to complete a scholarship application. Completed applications<br />
are then compiled by CNRC, forwarded to CNET, and presented to<br />
the selection board for evaluation.<br />
Application Evaluation<br />
__-...--------<br />
In the two-selection-board system of evaluation used by CNRC,<br />
applications were grouped by state, and three- or four-member<br />
selection committees were established to evaluate applicants in<br />
each state group. The number to be selected from a particular<br />
state was provided to the selection committee, and applicants<br />
were selected to meet that target. The early selection board<br />
(November) considered all applications received prior to the<br />
board convening date. The second selection board (February)<br />
evaluated all applications received by the scholarship<br />
application submission deadline, including those of individuals<br />
who were evaluated but not selected by the early selection<br />
board.<br />
In the 1986/87 scholarship year, 50 percent of those selected<br />
to receive an NROTC scholarship were identified by the early<br />
(November) selection board. The balance of those selected were<br />
identified through a series of weekly selection boards which<br />
met from January through March of 1987.<br />
To ensure consistency of scoring, application evaluation<br />
procedures used by each of the weekly selection boards were<br />
similar to those used during the two-selection-board process.<br />
Under'these procedures, selection board members were given very<br />
broad guidance and complete discretion in the awarding of<br />
evaluation points. Selection boards each had up to 100 points<br />
available to award to each applicant. The points awarded by<br />
the selection board were then added to a previously calculated<br />
base score called the applicant Quality Index (QI). The<br />
applicant QI is an optimally weighted selection composite<br />
developed by,the Navy Personnel Research and Development Center<br />
to predict student academic and military performance cr'iteria.<br />
The sum of the selection board score and the Quality Index<br />
determined the applicant's rank ordered position in the group<br />
of all applicants evaluated during the weekly selection board<br />
process.<br />
482
necessary to respond to the results of that review. The<br />
following system, with minor modifications, has been used in<br />
NROTC selection process, since.<br />
Current Selection Evaluation System<br />
The current NROTC Scholarship selection board evaluation system<br />
uses the Quality Index as a Base Score for each applicant. The<br />
Quality Index accounts for approximately 66 percent of the<br />
total applicant selection score. The selection board provides<br />
the remaining 34 percent.<br />
The selection board is provided with an evaluation score sheet<br />
that defines specific areas for selection board scoring<br />
consideration. The score sheet is divided into three broad<br />
areas of evaluation: scholarship, military bearing, and<br />
personal attributes. Contained within each area are specific<br />
scoring categories with each category assigned a maximum point<br />
value. Approximately 40 percent of the scoring categories<br />
include a recommended selection board score previously<br />
calculated by computer using established algorithms and raw<br />
data derived through the optical scan process (dot counting).<br />
Selection board members evaluate the application and assign<br />
points for each category. They may adjust the computerrecommended<br />
scores, if desired. The number of points assigned<br />
by the selection board member, including those recommended by<br />
computer, is then added to the Quality Index to determine the<br />
applicant's final standing in the rank ordered list of all<br />
applicants.<br />
The categories used by the selection board to evaluate a<br />
scholarship applicant are:<br />
<strong>Military</strong> Potential:<br />
Is applicant a military dependent? Score recommended<br />
Athletic participation Score recommended<br />
JROTWCAP participation Score recommended<br />
Applicant physical fitness<br />
Motivation for the NROTC Programs<br />
Interviewing officer evaluation<br />
Personal Factors:<br />
Leadership positions held<br />
Score recommended<br />
Involvement in non-school activity<br />
Did applicant experience adversity?<br />
Strength of character<br />
Exceptional achievement<br />
Potential for graduating with a<br />
tech degree Score recommended
i<br />
I<br />
I<br />
Scholarship:<br />
Quality of learning environment<br />
Transcript evaluation for math/<br />
science performance<br />
Intellectual motivation<br />
Teacher evaluations<br />
Evaluation of applicant's statement<br />
Course difficulty<br />
Score recommended<br />
Score recommended<br />
Score recommended<br />
Scholarships are offered based upon the rank order of all<br />
applicants. Adjustments may be necessary to meet specifically<br />
assigned state scholarship allocation targets, active duty,<br />
female, and minority targets, or Navy physical qualifications.<br />
Summarv<br />
The current selection board process has worked extremely well.<br />
The structure built into the evaluation system provides the<br />
consistency of applicant evaluation desired in a 6-month<br />
selection board cycle with varied selection board membership.<br />
The cost of that consistency, a limitation in selection board<br />
flexibility, appears to have had a positive effect as well.<br />
Selection board members feel comfortable working within the<br />
more structured system and selection or non-selection decisions<br />
are much more defensible. More importantly, several measures<br />
of incoming freshman class performance indicate that the<br />
process improved the selection board's ability to identify<br />
those most likely to perform well once enrolled in the NROTC<br />
Program. The performance of the scholarship students entering<br />
the program since the revised selection procedures were fully<br />
implemented has improved, with the average freshman year grade<br />
point average increasing from 2.89 in 1988 to 3.0 this past<br />
year. Freshman attrition has also decreased dramatically.<br />
Twenty-two percent of the freshman class attrited from the<br />
program during the 1988 academic year. Freshman attrition for<br />
the 1990 academic year dropped to 14 percent. The selection<br />
board average applicant score dropped to less than 50 percent<br />
of the total points available for awarding. This ensures that<br />
truly exceptional applicants can be awarded enough points for<br />
scholarship selection.<br />
References<br />
Mattson, J.D., Neumann, I., & Abrahams, N.M. (1986).<br />
Development of a revised composite for NROTC selection<br />
(NPRDC TN 87-7). San Diego: Navy Personnel Research and<br />
Development Center.<br />
Owens-Kurtz, C-K., Borman, W.C., Gialluca, K-A., Abrahams,<br />
N.M., & Mattson, J.D. (1989). Refinement of the Navy<br />
Reserve Officer Traininq Corps (NROTC) scholarship<br />
selection composite (NPRDC Tech. Note TN 90-l). San<br />
Diego: Navy Personnel Research and Development Center.<br />
485
Validation of the Naval Reserve Officers Training Corps Quality Index’<br />
Jack E. Edwards Regina L. Burch Norman M. Abrahams<br />
Navy Personnel Research Colorado State University Personnel Decisions<br />
and Development Center Ft. Collins, CO Research Institute, Inc.<br />
San Diego, CA Minneapolis, MN<br />
Using data from Naval Reserve Officers Training Corps (NROTC) entering classes of 1979 and 1980,<br />
Mattson, Neumann, and Abrabams (1986) optimally weighted six academic and personal factors: Scholastic<br />
Aptitude Test-Verbal (SATV), Scholastic Aptitude Test-Math (SATM), high school rating (HSR), an interviewer’s<br />
rating (INTER), the Strong-Campbell Interest Inventory career-tenure scale (SCII), and the Background<br />
Questionnaire career-tenure scale (BQ), to develop a selection composite for predicting three criteria. Recent<br />
Navy policy directed toward increasing the proportion of college graduates with technical degrees has made it<br />
necessary to develop and validate a new selection system that adds a new criterion, choice of technical major<br />
(TECH) to the three previously investigated criteria: college grade point average (GPA), naval aptitude grades<br />
(APT), and naval science grades (NSG).<br />
Obiective<br />
The objective of this paper is to review the development and validation of the new NROTC schokarship<br />
selection composite, the 1989 Quality Index (QI-89). Three steps were included: (a) developing the optimally<br />
weighted QI-89; (b) predicting a new criterion (TECH); and (c) constructing an expectancy table/chart to predict<br />
TECH using a new predictor, engineering-and-science-interest score (ES).<br />
Population<br />
Approach<br />
The population contained 6,609 individuals who had entered NROTC from 1983 to 1987 and completed at<br />
least one semester/quarter of the program. Men comprised 96.5% of the population, and 92.6% of the candidates<br />
were nonminorities. Each person had received a four-year national competition scholarship; had complctc data<br />
on all seven predictors; had valid scores for GPA, APT, and NSG; were Navy (versus Marine) option; and had<br />
a selection code of principal selectee, early select, alternate best, or altemab middle.<br />
Predictors<br />
Six predictors were used to develop the selection composite. A seventh predictor (ES) was used to dcvclop<br />
an expectancy chart for predicting TECH.<br />
SATV and SATM or American College Test (ACT) equivalents. These scores represent the verbal and<br />
quantitative aptitudes of an individual as measured by a national competitive testing program designed for college<br />
admissions and scholarship awards. If an individual took the standardized test(s) on multiple occasions, the<br />
highest score was used in the analyses. ACT scores were translated to equivalent SATV and SATM scores using<br />
a recently developed conversion table (Owens-Kurtz, Borman, Gialluca, Abraham& & Mattson, 1989).<br />
HSR. This measure is based on high school rank in class. It was computed with a two-step procedure.<br />
First,arcentile rank was determined by multiplying high school rank by 2, subtracting 1 from that product,<br />
and then dividing the difference by the product of class size times 2. Second, each resulting percentile rank was<br />
converted to an equivalent HSR via tabled values. This second step lessened the effect of the negatively skcwcd<br />
distribution of percentile ranks. HSR values can range from 0 to 100 in increments of 10.<br />
INTER. During a 15-minute interview, an officer rated an applicant on factors important to a career as a<br />
1 This research was supported by the Office of Naval Technology, Program Elcmcnt 0602233N. The<br />
opinions expressed in this paper are those of the authors, ‘arc not official, and do not ncccssarily rcllcct the views<br />
of the Navy Department. This paper was presented in Novcmbcr 1990 at the annual meeting of the <strong>Military</strong><br />
<strong>Testing</strong> <strong>Association</strong> at Orange Beach, AL as part of J. W. Twecddalc’s (Chair) symposium, The Naval RCSC~VC<br />
Officers Training Corns (NROTC) Scholarship Selection System.<br />
486
---<br />
-_ ------__ -_-._-_--.-_.--- -__-.___--_<br />
naval officer (e.g.. poise and the officer’s willingness to have the individual serve under his/her command). Each<br />
applicant was assigned an overall rating of very high (1) to very poor (5). For consistency, this scale was rcvcrse<br />
scored.<br />
SCII. This scale consists of 76 item-responses from the Strong-Campbell Interest Inventory that predict<br />
officeztcntion for at lcast one year beyond the minimum obligated service (@cumann & Abrahams, 19781).<br />
The authors rcportcd a biscrial correlation of .I9 bc(ween the XII and cxtcndcd scrvicc. Scores can range from<br />
62 to 138.<br />
B(J. The career tenure scale, developed in 1981, is based on 14 biodata and personality items from<br />
Rimland’s (1957) Background Questionnaire. Neumann (personal communication, 1989) rcportcd a biscrial<br />
correlation of .I2 between the BQ and NROTC attrition. Scores can range from 93 to 107.<br />
ES. Engineering and science interests are idcntificd through 132 item-responses from the Strong-Campbell<br />
IntcreyInvcnlory (Neumann & Abrahams, 1978b). The authors reported biserial correlations of 56 and 58<br />
between ES and choice of final major for two cross-validation samples. Scores on this scale can range from 31<br />
t o 163,<br />
Criteria<br />
Four pcrformancc crilcria were used individually or in composites. When the four single-crilcrion regression<br />
equations wcrc combined into a composile, the following weights were assigned to the criteria: 40% for GPA,<br />
30% for APT, 20% for NSG, and 10% for TECH. Scores on GPA, APT, and NSG were standardized (using<br />
x-scores) within each host or cross-enrollment school. For individuals who attrited prior to the end of the first<br />
academic year, scores were cumulated to the time the individual left the NROTC program.<br />
First-Year GPA. This mcaswe is the grade-point avcragc obtained from all college courses that were lakcn<br />
during the first academic year.<br />
First-Year APT. APT is the first-academic-year, grade-point average in nonacademic military aspects of the<br />
NROTC program. An individual is assigned a grade of 0 to 4.00 by NROTC instructors on each of<br />
approximately 20 pcrformancc aspects and personal traits (e.g., goal setting and military bearing). APT is<br />
primarily used to dctcrminc how well an individual is adapting to the Navy and NROTC.<br />
First-Year NSG. This measure is the grade-point avcragc for naval science courses taken during the first<br />
academic year. These courses are Navy-relevant academic classes that include subjects such as navigation and<br />
seamanship. Students must take eight such courses; most students take one course each semester.<br />
Final TECH. majors wcrc categorized as either non-technical (1) or technical (2) using categories that wcrc<br />
obtained from the Chief of Naval Education and Training (CNET). Individuals with valid scores for TECH<br />
represented a subset of the larger sample. TECH was considcrcd valid if the candidate had cntcred collcgc in<br />
(a) 1983, 1984, or 1985 or (b) 1986 and had complctcd at ieast one scmcstcr/quarter of hi&r junior year.<br />
TECH was included as an additional criterion in an attempt to maximize the number of scholarships awxdcd IO<br />
applicants who would eventually choose a technical college major.<br />
Procedure<br />
Development and cross-validation samples. The 5,957 people entering NROTC between 1983 and 1986 wc.rc<br />
randomly assigned to cithcr a devclopmcnl or cross-validation sample @ = 3,652 and B = 2,305, respectively).<br />
A third sample, 652 individuals who cntercd NROTC during 1987, was used as a second cross-validation sample<br />
10 cnsurc that wcighti rcmaincd stable for the most rcccnt year for which crilcrion data wcrc avnilablc.<br />
Dcvcloping and cross-validating optimally weighted composites. Validity coefficients corrected for rang<br />
restriction wcrc used in multiple regression analysts to dcvclop optimally wcightcd selection composites for<br />
picdicting each of the four individual criteria. Although this procedure results in four sepamtc composite scores,<br />
applicants must ultimately be rank-ordcrcd on a single metric in order to make sclcction decisions. To obtain<br />
such an overall composite, the single-criterion composites were combined into the QI-89 in order to predict the’<br />
four single crilcria simultaneously. Weights were dcrivcd for these overall composites by combining predictor<br />
weights obtained for the single-criterion rcgrcssion equations.<br />
487
--__- _.~.<br />
Each of the composites was then cross-validated on both hold-out samples. The composites were evaluated<br />
for their ability to predict GPA, APT, NSG, and TECH in the 1983-1986 sample, and their ability to predict<br />
GPA, APT, and NSG in the 1987 sample.<br />
Determining effective weights. To assess the percentage of weight that each predictor received in the<br />
selection composites, effective weights scaled to 100% were computed. To compute the effective weights, the<br />
unstandardized b weights for each predictor within a composite were first multiplied by the corresponding standard<br />
deviation for that predictor. The products of b times SD were then summed across all the predictors included<br />
in the composite. The & weight for each prec&tor wazhen divided by that sum and multiplied by 100.<br />
Constructing the exnectancv table. The expectancy table for using the ES scale was developed by first rankordering<br />
the scores of midshipmen. Then, the distribution was divided as equally as possible into five groups.<br />
For each fifth, ranging from high to low, the percentage of ES majors was computed.<br />
Develonmenl Sample<br />
Results and Discussion<br />
Table 1 shows the means and standard deviations for the predictors and criteria, and the correlations for<br />
those two sets of variables. This information is provided for both the entire devclopmcnt sample and the<br />
development subsample that had valid scores on the TECH criterion. The validities were corrected for restriction<br />
in range prior to performing the regressions. Validities increased approximately .02 to .03 after corrections.<br />
The SATM means indicate that the average NROTC scholarship student scored at approximately the 89th<br />
percentile in mathematics aptitude. The SATV and HSR means are also above average. For SATV, the avcragc<br />
NROTC scholarship student outperforms approximately 87 percent of the college-bound seniors taking the test.<br />
Finally, the average NROTC scholarship student had an HSR of 73.55. That HSR value indicates that the avcragc<br />
NROTC scholarship recipient graduated in the top 10% of his/her high school class.<br />
There were negligible differences between the predictor and criterion means and standard deviations for the<br />
full development sample and its subsample. The intercorrelations among GPA, APT, and NSG were slightly<br />
lower for the dcvclopment subsample than for the full development sample. The three criteria were moderately<br />
correlated, with NSG and GPA being the most highly rclatcd. This result would be expected because these two<br />
criteria measure academic factors; furthermore, NSG is computed from a subset of the courses included in GPA.<br />
These relationships are also consistent with Mattson et al’s findings (1986). In that study, intercorrelations varied<br />
from .40 to .54, and GPA and NSG were the most highly intercorrelated of the three criteria. GPA showed the<br />
highest relationship with TECH, a criter!on not included in earlier composites. The correlations between the three<br />
criteria and TECH were, however, much smaller than the intcrcorrclations among GPA, APT, and NSG.<br />
The predictor-criterion correlations varied little in magnitude between the full development sample and its<br />
subsample. For both groups, HSR was the variable most highly correlated with GPA and APT. SATV and HSR<br />
showed the strongest relationships with NSG. Although SATM and SATV showed strong relationships with GPA<br />
and NSG, respectively, they showed virtually no relationship with APT. The interview rating, however, was<br />
related to APT. These latter two sets of findings are consistent with the observation that. GPA and NSG measure<br />
the academic performance of NROTC participants while APT mcasurcs military characteristics of the future<br />
officers. ES was the predictor most highly correlated with TECH. This outcome was cxpectcd because the ES<br />
scale was specifically developed to predict final major. Finally, SATM also had a strong association with TECH.<br />
Cross-Validation<br />
-.. ,-.. ...__.l__r ,__.<br />
Predictor scores were computed for each of the five composites (i.c., the four single-criterion composites<br />
and the QI-89) using data from the hold-out samples. These scores wcrc then correlated with each criterion.<br />
Single-criterion composites. Table 2 shows the cross-validity coefficients that were obtained for the four<br />
single-criterion composites. The cross-validities for the single-criterion composites were provided principally to<br />
show the upper limit of prediction for a given criterion since each composite should predict its own criterion<br />
better than any of the other composites. To use the table, the criterion of interest is located in a right-hand<br />
column, and the predictor composite is located in the left-hand column. The cross-validity for that predictorcriterion<br />
combination is found at the intersection of the corresponding column and row. For example, the .I22<br />
shown in the first row, second column indicates the cross-validity estimate that was obtained when weights that<br />
488
I<br />
were dcrivcd lo optimally prcdicl CPA (for tbc devc.lopment sample) were used LO predict APT in the 1983-<br />
1986 hold-out sample.<br />
Table 1<br />
Descriptive Statistics for the Full Development Sample and the Development Subsample<br />
Variable Mean<br />
-___<br />
Predictor<br />
SATV,<br />
SATV,<br />
558.5 1 76.78 .124<br />
560.08 76.98 .lOl<br />
SATM, 642.91 64.35 .I87<br />
SATM, 642.85 63.99 .I83<br />
HSR, 73.55 16.81 .272<br />
HSR, 74.45 16.34 280<br />
INTER, 4.82 .48 .035<br />
INTER, 4.83 .46 .030<br />
SCII,<br />
SCIL<br />
105.36 5.96 -.076<br />
105.49 5.88 -.069<br />
BQ, 100.97 2.32 .007<br />
BQ, 101.09 2.30 -.006<br />
ES, 110.19 13.55 .013<br />
ES, 1 IO.62 13.28 .023<br />
Criterion<br />
CPA,<br />
GPA,<br />
49.88 9.66 1 .ooo<br />
51.72 8.16 1 .ooo<br />
A=, 49.71 9.77 .425<br />
APT, 52.15 8.43 .363<br />
NSG, 49.80 9.77 .562<br />
NSG, 51.66 8.57 .546<br />
--e-s-<br />
Correlations with Criterion<br />
CPA APT NSG TECH<br />
-. --.-- .-.- ____<br />
.027<br />
.013<br />
.036<br />
.047<br />
.132<br />
.I05<br />
.093<br />
.Ohl<br />
-.OlO<br />
-.008<br />
.047<br />
.OlO<br />
.022<br />
.033<br />
1 .OOO<br />
1 .ooo<br />
.419<br />
.363<br />
-lJ=-b<br />
TECH,<br />
--<br />
1.59<br />
--<br />
.49<br />
---<br />
.243<br />
_--<br />
.104<br />
- -<br />
f as a subscript dcnotcs Lhc full development sample @ = 3,652).<br />
r as a subscript dcnotcs the reduced dcvclopmcnt sub.sample @ = 2,077).<br />
.192<br />
-193<br />
.OY2<br />
A?54<br />
.171<br />
.I56<br />
419<br />
.016<br />
-.021<br />
-.o 14<br />
--_<br />
-.OSl<br />
---<br />
.221<br />
---<br />
.093<br />
---<br />
-.oos<br />
-.v<br />
.06h<br />
.067 -me<br />
.052 -330<br />
.083<br />
AI94<br />
1 .ooo<br />
1 .oOo<br />
--_<br />
.399<br />
--- ---<br />
.161 1 .ooo<br />
Of primary interest arc Lhc bold-faced values shown on the diagonal. These V~~UCS reflect the prcdiclivc<br />
abiliry for each composilc’s target criterion; for example, the GPA composile rcvcals a cross-validity of .289 with<br />
the GPA criterion. These diagonal values may bc compared with the four corresponding validities observed for<br />
these composites in the developmental sample: .327 for GPA; .175 for APT; .297 for NSG; and .455 for TECH.<br />
As expected, development-sample validities wcrc sBghtly higher than the corresponding cross-validities. The GPA,<br />
NSG, AFT, and TECH composites each predicted its target criterion better than any of the other composilcs.<br />
Surprisingly, lhe APT composite was a better predictor of GPA and NSG than of APT. Overall, three of t!~<br />
four. composks (CPA, APT, and NSG) were bctlcr prcdiclors of GPA and N.SG than of APT and TECH.<br />
489<br />
- - -
Single-Criterion Predictor Composite<br />
GPA,<br />
GPA,<br />
Table 2<br />
Cross-Validity Coefficients for Single-Criterion and QI-89 Composites<br />
GPA<br />
-<br />
Criteria NSG TECH<br />
.289 .122 .237 .138,<br />
.304 -.oo I .228 ---<br />
APT, .219 .137 .230 .I08<br />
Aflb .220 .073 .193 ---<br />
NSG, .220 .I18 .291 .12l,<br />
NSG, .227 .009 .286 ---<br />
TECH, .102 .036 .137 .430,<br />
TECH, .I23 .056 .063 ---<br />
______________-__--_----------------- ________________________________________--------------------------------------------------------------<br />
QI-89. .282 .I26 .246 .131,<br />
QI-89, 296 .Oll .238 ---<br />
a as a subscript denotes rhe 1983-1986 hold-out sample @ = 2,305).<br />
b as a subscript denotes the 1987 hold-out sample (I’J = 652).<br />
c as a subscript dcnotcs Lhc 1983-1986 rcduccd hold-out sample @ = 1,313) wilh a valid TECH score.<br />
The cocfficicnts oblaincd on Ihe second cross-validation sample arc shown directly under the bold-faccd<br />
cross-validilics. Across all four single-criterion composites, the cross-validilics obtained on the 1987 sample varied<br />
litllc from l.hosc ob@ined on the 1983-1986 sample when GPA and NSG were predicted. All of the composiWs<br />
prcdictcd GPA slightly bcWr in the 1987 sample and NSG slightly bcUer in the 1983-1986 sample. Somewhat<br />
larger differe.nces wcrc found when predicting APT, The GPA, APT, and NSG composites predicted APT bcltcr<br />
in the 1983-1986 sample than in tie 1987 sample. The cross-validities for Ihe two samples were both near .OO<br />
when the’ TECH-derived composite was used to predict APT. Inspection of the correlations between APT and<br />
several highly-weighted predictors (i.e., HSR, SATM, and SATV) rcvealcd that diffcrcnccs in zero-order validities<br />
for the two samples appcarcd to account for the subscqucnt diffcrcnces in predictive abilily for lhcsc composites.<br />
QI-89. The bottom portion of Table 2 contains cross-validity coefficients for QI-89. In gcncral, the crossvalidity<br />
coefricicnts obtained with Ihc QI-89 showed little shrinkage from those obtained when each singlccriterion<br />
predictor composile was used to predict itself. The one exception occurred when TECH was the.<br />
criterion. This finding is logical because CNET assigned a relatively small importance rating to TECH when it<br />
was combined with the other three criteria. Although the QI-89 was only marginally useful for prcdicling APT<br />
and TECH, it retained a moderate level of predictive ability when used to predict GPA and NSG.<br />
Predicting Technical Maiors<br />
As shown in Figure 1, Lhosc midshipmen in the upper 20% on the ES scale were more than twice as likely<br />
to choose technical majors than those in Ihe lower 20%. To use the table, an individual’s ES score is locntccl<br />
in tic table, and Lhc likelihood of that individual sclccling a technical final major can bc determined. An adjunct.<br />
table for estimating ES was used (rather than incorporating ES into the optimally wcightcd selection composilc)<br />
so as lo avoid eliminating applicants with outstanding crcdcntials who might not rcccive NROTC scholarships if<br />
their intcrcsls tcndcd toward non-technical fields of study.<br />
Conclusions<br />
1. Although ES is dcrivcd from an instrument (i.c., Strong-Campbell Interest Inventory) that is susccptiblc<br />
10 distortion, results showed that ES can significantly incrcasc the proportion of technical majors.<br />
490<br />
- -
2. While the academically oriented criteria (Le., GPA and NSG) are predicted reasonably well, there is<br />
room for improvement for the military-performance criterion (APT).<br />
121 andatum<br />
114thu 120<br />
707 thfu 113<br />
98UtnrlO6<br />
97andMow<br />
Figure 1<br />
Expected Percentages of Midshipmen Selecting Technical Majors<br />
Next%<br />
Next 20%<br />
Mattson: i.D., Neumann, I., & Abrahams, N.M. (1986). Development of a revised comoosite for NROTC<br />
selection (NPRDC TN 87-7). San Diego: Navy Personnel Research and Development Center.<br />
Neumann, I., & Abrahams. N.M. (1978a). Construction and validation of a Strong Carn~bell Interest Inventory<br />
career tenure scale for use in selectinn NROTC midshipmen (NPRDC Letter Rep.). San Diego: Navy<br />
Personnel Research and Development Center.<br />
Neumann, I., & Abmhams, N.M. (1978b). Identification of NROTC apulicants with engineerinn and science<br />
interests (NPRDC Tech. Rep. 78-31). San Diego: Navy Personnel Research and Development Center.<br />
Owens-Kurtz, C.K., Borman, W.C., Gialluca, K.A., Abraham% N.M., & Mattson, J.D. (1989). Refinement of<br />
the ,Navq, Reserve Officer Training Corns (NROTC) scholarship selection composite (NPRDC Tech. Note<br />
TN 90-I). San Diego: Navy Personnel Research and Development Center.<br />
491
DEVELOPMENT AND IMPLEMENTATION OF A STRUCTURED<br />
INTERVIEW PROGRAM FOR NROTC SELECTION<br />
Walter C. Borman<br />
University of South Florida<br />
and Personnel Decisions Research Institutes, Inc.<br />
and<br />
Cynthia K. Owens-Kurt2<br />
and Teresa L. Russell<br />
Personnel Decisions Research Institutes, Inc.<br />
The Navy Reserve Officer Training Corps (NROTC) is one of the majc-:<br />
sources of Navy and Marine Corps officers. Presently, 40,000 young men 3.!: '<br />
women apply for a 4-year NROTC scholarship each year. Approximately 40% (:f<br />
this total pass an ,initial screen based on college board scores (minim:ca<br />
430 verbal and 520 math on SAT or equivalent ACT scores), proper age<br />
(between 17 and 21 when school,starfs, and no more than 25 at estimated<br />
time of college graduation), and, acceptable progress through high school.<br />
Those passing the screen (called Board Eligibles) are required to complete<br />
an application blank and to interview with a Naval officer, typically at<br />
one of the 43 recruiting.district headquarters. The focus of this paper ir<br />
on this officer interview.<br />
As conducted, previously, an officer interviewed each Board Eligible<br />
applicant, usually for 15-4-O minutes depending upon the personal style (-;f<br />
the interviewer and on the interview load (i.e., the number of NROTC Bo~I:'~;<br />
Eligibles that must be interviewed that day). Interviews were unstructuj-:<br />
in that interviewers were free to ask any questions they believed were<br />
relevant. After completion of the interview, the interviewer completed .?<br />
brie,f rating form.<br />
Experience with the previous NROTC interview showed that ratings wereoften<br />
at or near the top (most effective) end. For example, the mean<br />
rating on the Overall Potential scale for Board Eligibles in the most<br />
recent class for which data were available (class entering NROTC 1985) L:,:!:,<br />
4.68 on the 5-point scale. Further, when interview ratings (on the<br />
Potential scale) were correlated with the NROTC performance criteria, gra.i:<br />
point average (GPA), Naval science grades (NSG), and an aptitude rating<br />
(APT) , results were near zero (Owens-Kurtz, Borman, Gialluca, Abrahams, &<br />
Mattson, 1988). Finally, the effective weights for the interview when USC?<br />
along with SAT scores, high school rank, and SCII/BQ scores in regression<br />
analyses against these criteria were very low for GPA and NSG and only the<br />
third highest contributor to pyedic tion<br />
of APT (Owens-Kurt2 et al., 19ZSj.<br />
--------___-a--------m-w--------<br />
This research was supported by funds from the office of Naval Technolo9:;<br />
Program Element 0602233N. The opinions expressed are those of the authcrc;<br />
and do not necessarily reflect those of the U. S. Mavy.<br />
492
Accordingly, the NROTC selection interview program appeared to need<br />
improvement. The ratings on the intewiew form showed little<br />
differentiation between applicants, and the validity of the interview<br />
ratings was low.<br />
One plausible reason for problems with this interview is its<br />
unstructured format. Reviews of the employment interview (Arvey & Campig:<br />
1982; Schmitt, 1976) indicate that structured interviews generally provid!<br />
more valid prediction of performance than do unstructured interviews. A<br />
recent meta-analysis found a .35 mean uncorrected validity coefficient fo:<br />
structuied interviews, whereas unstructured interviews had a mean<br />
uncorrected validity of .11 for the studies included in the analysis<br />
(Cronshaw & Wiesner, 1989). It is possible that a structured interview fc<br />
NROTC selection might improve the interview's validity for identifying<br />
applicants likely to succeed in the NROTC program.<br />
This paper first describes development of the structured interview<br />
materials and then an evaluation of interview ratings made during pilot<br />
tests of these materials.<br />
METHOD<br />
Identifyinq Tarqet Predictor Constructs<br />
The first step in developing a structured interview protocol was t
i<br />
-_ll--ll-._l_--___----II----ll - ._.~_. -<br />
Accordingly, meetings with officer staff members in five NROTC units<br />
were conducted to generate ideas for these predictor constructs. A<br />
preliminary list of constructs emerqed from sessions with primarily COs and<br />
Class Advisors in these units. This list was briefed to the Chief of Navaj.<br />
Education and Training (CNET) staff and to Selection Board members and was<br />
then revised based on their feedback. The constructs are: NROTC Interest<br />
and Motivation; Leadership Potential; Responsibilities; Organization of<br />
Tasks and Activities; and Communication.<br />
Preparing Behavioral Statements for the Ratins Scales<br />
At this point, we prepared preliminary behavioral statements to<br />
reflect effective, average, and ineffective interviewee responses in each<br />
one of the five construct areas. The behavioral statements were based on<br />
what recruiters wit-n considerable experience in NROTC selection interviec;s<br />
had observed in actual interviews. We also received feedback from CNET and<br />
Selection Board members, and made final revisions. One of the resulting<br />
rating scales is shown below, with its behavioral standards.<br />
, cxprare, ~~~~-~tu?din~ duix to tx Naval . Ice1 4.ycrr Ccmrr~lTCJ Lg muonabl: ex- � r~arr~ohrvcnorcsl~tuu~inbcfn~r<br />
officn; would pobJbly rccept COll~:gC c~ngcfur~rcholur~ip;~urcrluul pluu h'nvy/hirvincCorprofiicrr;m~yprcfcr<br />
plogram ifrcjcud for r:holrrhhip ifrcjukd for rclmlurhip civilian r~l~olarrhip<br />
. rhowvi strong inlcrcrr in he Navy/hfarLu: ’ LC~ qu=tio~ aku md appcm maombly � ������� ��� ������ �� ���␛ �� ������<br />
Corps tluough impressive knowledge about incues~dinN~~alScrvi~~/~KO~~~~~~~~ Cups; mry LliUS wlrly 0" rclloidlip<br />
the Nsval Sc~jcJEIHOTC, d~ouglllful qua- money<br />
GO- rbo~~Ih~p~ogram.mdenlhusiaslic aIdmd,dcmurdh'ROTC<br />
_ ,_ . . . . . _. ._ . . . . . . . . . . . . . ._<br />
preparins Interview Ouestions<br />
After the interview rating scales were developed, we began preparinq<br />
questions designed to probe for reports of past behavior relevant to<br />
effectiveness in each area. Several questions for each rating category<br />
were developed and tried out with recruiters. The recruiters used<br />
different questions with different applicants, and noted those that seeme3<br />
to be most and least effective at eliciting responses useful for making<br />
ratings on each scale. The three to four questions that appeared most<br />
effective for each category were then presented to CNET, and final<br />
revisions to the questions were made.<br />
Prepariw Interview Instructions, a Training Videotape, and the Intervietd<br />
Worksheet<br />
---_<br />
In addition to development of the interview protocol rating scales an..;<br />
the interview questions, it was necessary to prepare instructions and an<br />
interviewer training videotape to ensure the structured interview is<br />
conducted properly. Thus, instructions and the videotape were prepared,<br />
along with an interview worksheet, with the interview questions present&<br />
and space provided for the interviewer to take organized notes of<br />
interviewee responses. The instructions and accompanying videotape prov:i':a<br />
brief training program on structured interviewing, explain proper use OL<br />
4 9 4
the behavioral statements for guiding interview ratings, and orient the<br />
interviewer to use the questions, the worksheet, and the interview rating<br />
form.<br />
&lot Testinq the Structured Interview<br />
The interview protocol and rating form were pilot tested in two wave:<br />
with a total of 31 officer interviewers and 93 applicants in seven<br />
different locations. Means and standard deviations of the ratings provid.<br />
data on their spread and overall distribution.<br />
As part of the pilot testing, an interrater reliability study was<br />
conducted. One way to assess the quality of data emerging from the ne:q<br />
structured ,interview is to determine how closely two interviewers agree i<br />
their independent ratings of the same interviewees. Thus, we initiated a<br />
interrater reliability study with 10 officers interviewing a total of 24<br />
applicants. All interviewers were trained to do the structured intervie,.+<br />
and to use the rating form. Each applicant was interviewed by two office<br />
recruiters.<br />
After each interview session, the interviewer completed the rating<br />
form and provided a copy to the researcher. Officers interviewing the s:<br />
applicant never discussed that applicant before making their ratings, so<br />
the interview judgments were generated totally independently. Intraclaorcorrelations<br />
were computed for each dimension separately and for the sum<br />
the dimension ratings. This provides an estimate of the across-intervie-consistency<br />
of ratings made using the new interview protocol and rating<br />
form.<br />
RESULTS<br />
Means and standard deviations for Wave 2 interview ratings, gathered<br />
after major revisions of the interview protocol (after Wave 1 pilot<br />
testing), appear in Table 1. For these 59 applicants, means are close TV<br />
4.0 (on a 5-point scale) and standard deviations are approximately 1.0.<br />
Further, these means compare favorably with data for the previous inter:;;<br />
(M=4.68 in 1985). Of course, this is not a very fair comparison because<br />
ratings on the new format were gathered for research, whereas the 4.68 r:t;<br />
is based on operational ratings. Nonetheless, applicant ratings using th<br />
new protocol appear to provide reasonable spread for the interview rati.n!:l.<br />
of these typically high quality NROTC applicants.<br />
Table 1 also contains the interrater reliability coefficients for<br />
ratings made of the 24 applicants evaluated by two independent<br />
interviewers. These are very high reliabilities, with considerable<br />
agreement shown on the part of the interviewers.<br />
495
.<br />
TABLE 1<br />
Means, Standard Deviations, and Interrater<br />
Reliability Coefficients for New Interview<br />
Protocol Ratings<br />
(N=59)<br />
Dimension -a.- S D Reliabilitya*<br />
NROTC Interest and Motivation 3.95<br />
Leadership Potential 3.78<br />
Responsibilities 4.00<br />
Organization of Tasks & Activities 4.20<br />
Communication 4.24<br />
Overall Evaluation 3.98<br />
Sum of First Five Dimensions 20.17<br />
1.07 . 81<br />
1.26 . 8 7<br />
. 95 . 81<br />
. 87 . 83<br />
. 99 . 93<br />
. 97 . 93<br />
4.35 . 95<br />
a. These are 2-rater intraclass correlations; N=24<br />
In addition, officer interviewers who used the new interview<br />
procedures were asked their opinions about this protocol compared to the<br />
previous one. Their comments are summarized in Table 2.<br />
TABLE 2<br />
Summary of Comments on the New<br />
Interview Protocol Rating Form<br />
a Big improvement over old form<br />
a Not too long or burdensome (interviews timed at 12-30 minutes<br />
including answering candidate questions, not filling out form; aver;rc:;.<br />
about 15-18 minutes)<br />
a Concept of behavioral standards well understood and accepted<br />
a Worksheet especially helpful when doing several interviews back-toback<br />
without completing the ratings<br />
a Videotape seen as very clear and useful<br />
a Interview program takes pressure off interviewer by providing good<br />
questions to ask<br />
a Interview program gives diverse interviewer types (e.g., NROTC staff,<br />
officer recruiters, Reservists, etc.) more common frame of reference<br />
DISCUSSION AND CONCLUSIONS<br />
Evaluations of the new interview materials and procedures by NROTC<br />
Selection Board members, NROTC officers, and officer recruiters responsiblk:,<br />
for interviewing NROTC applicants (as well as data from field tests of th?<br />
interview), suggest that these materials and procedures are ready for<br />
implementation. What is most urgently needed to evaluate the usefulness of<br />
496
- .__<br />
the new interview is criterion-related validity information. Future<br />
validation efforts will be important in evaluating the value of interview<br />
ratings by themselves and in combination with other measures (e.g., colleq<br />
board scores), in predicting important NROTC criteria such as GPA, NSG, 21:<br />
APT, and perhaps attrition from the scholarship program. The interrater<br />
reliability study on the new interview (Borman & Owens-Kurtz, 1989) and<br />
data from Table 1 suggest that the interview has qood potential for<br />
improving the prediction of NROTC student performance. However, validity<br />
data are needed to assess its usefulness in actual practice.<br />
REFERENCES<br />
.<br />
Borman, W. C., & Owens-Kurtz, C. K. (1989). Development and field test __ .-._ of<br />
a structured interview protocol for NROTC selection (Institute Repcr?<br />
178). Minneapolis, MN: Personnel Decisions Research Institutes, 13~.<br />
Cronshaw, S. F., & Wiesner, W. H. (1989). The validity of the employmenk<br />
interview: Models for research and practice. In G. R. Ferris, and I:<br />
W. Eder (Eds.), The employment interview: _ Theorv, research and<br />
practice. Beverly Hills, CA: Sage.<br />
Owens-Kurtz, C. K., Borman, W. C., Gialluca, K. A., Abrahams, M. M., h<br />
Mattson, J. D. (1988). Refinem_ent of the Navy Reserve Officer<br />
Traininy Corps (NROTC) scholarship selection composite (Institute<br />
Report 144). Minneapolis, MN: Personnel Decisions Research<br />
Institutes, Inc.<br />
Schmitt, N. (1976). Social and situational determinants of interview<br />
decisions: Implications for the employment interview. Personnel<br />
Psycholoqy, 22, 79-101.<br />
497<br />
I<br />
I<br />
j<br />
!
I__--..<br />
_ , _.. .~ -.._- --..<br />
Development of an Experimental Biodataemperament Inventory for NROTC Selection1<br />
Mary Ann Hanson and Cheryl Paullin<br />
Personnel Decisions Research Institutes, Inc.<br />
Walter C. Barman<br />
University of South Florida and Personnel Decisions Research Institutes, Inc.<br />
One component of the Naval Reserve Officer Training Corps (NROTC) Scholarship program selection<br />
process in need of revision or replacement is the Biographical Questionnaire (BQ). The BQ key<br />
(Neumann, Githens, & Abrahams, 1967), which was developed to predict officer retention beyond initial<br />
obligated service, is somewhat dated and does not correlate well with NROTC performance criteria,. In<br />
addition, the BQ itself was developed over forty years ago (Rimland, 1957). Much has been learned in<br />
the meantime about the development of biodata items, and many of the BQ items appear dated. Thus, the<br />
development of a new biodata instrument seemed in order. This paper will describe the development,<br />
preliminary evaluation, and refinement of an experimental biographical data and temperament inventory<br />
designed to predict NROTC performance and attrition.<br />
Method<br />
Developing the pilot Profile of Exoeriences and Characteristics (PEC)<br />
A rational, construct-based approach was taken, both to develop and to refine this new experimental<br />
inventory, The first step in developing the inventory was to more clearly specify the criterion constructs<br />
it is designed to predict. Performance measures currently used by the NROTC were identified (e.g.,<br />
Naval Science Grades), and the constructs that underlie these performance measures were specified. The<br />
underlying performance constructs identified were academic achievement, leadership, military bearing,<br />
and goal setting. Attrition from the NROTC program occurs for a variety of reasons, and the underlying<br />
causes of attrition include academic failure, inaptitude, and dislike for the military (see Owens-Km%<br />
Gialluca, & Bonnan, 1989). The present research focused on identifying predictors of the performance<br />
and attrition constructs for which prediction is presently poor. Because academic achievement and academic<br />
failure are predicted at least moderately well by existing predictors, less emphasis was placed on<br />
identifying predictors of these criteria.<br />
A literature review was conducted to identify individual differences constructs, especially biographical<br />
and temperament constructs. that have shown empirical links with criteria similar to the NROTC performance<br />
and attrition constructs in past research. Item-level validities for several other inventories were<br />
also reviewed. Eight individual differences constructs wete identified that have been found, in past research,<br />
to be valid predictors of criteria similar to the NROTC performance/attrition constructs. These<br />
eight constructs were labeled: (1) Achievement Motivation; (2) Team Orientation; (3) Dominance; (4)<br />
Sociability; (5) Leadership Orientation; (6) NROTC/<strong>Military</strong> Interest and Motivation; (7) Organization<br />
and Planning; and (8) Responsibility.<br />
Items were written to tap each of these eight constructs. Past research (e.g., Doll, 1971) has shown<br />
that responses to verifiable items (i.e., items for which the truthfulness of responses can be checked using<br />
an external source) are less often distorted, Because biodata items typically deal with observable behav-<br />
1 This research was supported by funds from the Office of Naval Technology, Program Element 062233N.<br />
The opinions expressed are those of the authors, and do not necessarily reflect those of the U.S. Navy.<br />
498
iors, these items are more likely to be verifiable. Thus, an effort was made to include as many biodata<br />
items as possible in the pilot version of the PEC. However, when sufficient numbers of biodata items<br />
could not be written to adequately cover a construct, temperament items were also included. Between 13<br />
and 21 items’were written to tap each of the eight predictor constructs. In order to detect response distortion<br />
by applicants if it occurs, a ten item response validity scale (called the Unlikely Virtues scale) was<br />
also developed and included in the inventory. Thus, the pilot version of the inventory, called the Profile<br />
of Experiences and Characteristics (PEC), contained 151 items.<br />
Evaluating the PEC<br />
Both rational and empirical approaches were taken in evaluating and refining the PEC. The rational<br />
approach was a retranslation exercise in which researchers independently categorized the PEC items into<br />
the eight biodata/temperament constructs. The empirical approach involved administering the PEC to a<br />
large sample of NROTC applicants. The inventory was also administered to a comparison sample of<br />
NROTC scholarship students.<br />
Retranslation Exercise<br />
The retranslation exercise had two purposes: (1) to determine whether researchers would agree concerning<br />
the placement of items on constructs; and (2) to obtain information that could be used to further<br />
revise and refine the composition of the constructs and their definitions. Seven researchers who were<br />
knowledgeable about biodata and/or personality research were asked to independently sort each of the<br />
PEC items into one of the construct categories according to the perceived match between item and category<br />
content. The degree of agreement among these researchers concerning the placement of items was<br />
then evaluated.<br />
Pilot Test<br />
The PEC was administered to all Board Eligible NROTC applicants who were processed for the<br />
1990 NROTC scholarship program between 18 December 1989 and 30 January 1990 as part of their application<br />
process. Completed PEC inventories were obtained for 972 NROTC applicants from nearly all<br />
of the 41 Navy Recruiting Districts. About 90 percent of the respondents in this pilot test sample were<br />
either 18 or 19 years old, and 91 percent were male.<br />
Frequency counts were conducted to identify and eliminate items where the vast majority of respondents<br />
marked the same response alternative. Next, a rational scoring scheme was developed so that a<br />
preliminary set of item- and scale-level scores could be computed. When cntenon data become available,<br />
this scoring system may need to be modified. The item-level scores that were computed were intercorrelated<br />
and factor analyzed.<br />
Comparison with “Honest” Sample<br />
In order to obtain some base rate information regarding how “honest” respondents (i.e., respondents<br />
who have little to gain by distorting their responses) score on the PEC, a comparison sample of students<br />
already enrolled in the NROTC scholarship program was administered the PEC under instructions to respond<br />
as honestly as possible, A total of 175 first-year NROTC scholarship students from the University<br />
of Minnesota, Notre Dame University, and Carnegie-Mellon University completed the PEC in January<br />
1990. This sample was 93 percent male.
-.<br />
_.---___----<br />
Data from this comparison sample were scored using the sanmprocedures that were used in the applicant<br />
sample. Mean item-level scores from the NROTC student sample were compared with those from<br />
the pilot-test sample in order to identify items with substantially different base rates. If an item’s mean<br />
score in the applicant group is slanted considerably more in the socially desirable direction than that of<br />
the student sample, it suggests that the item is relatively easily distorted by applicants.<br />
Refining the PEC<br />
Results from the pilot test data analyses, along with the information from the retranslation exercise,<br />
were used to revise the composition of the PEC constructs and their definitions. The inventory was then<br />
refined and shortened for future administrations. Descriptive statistics, internal consistency reliabilities,<br />
and scale score intercorrelations were computed for the final shortened scales.<br />
Evaluating the PEC<br />
Retranslation Results<br />
Results and Discussion<br />
Seventy-seven percent of the PEC items were sorted into the same predictor construct scale by five<br />
or more of the seven researchers who participated in the retranslation exercise. Seven of the remaining<br />
items were from the Unlikely Virtues (response validity) scale. It is not particularly surprising that some<br />
researchers sorted the Unlikely Virtues items into the construct categories. The Unlikely Virtues items<br />
were specifically written to resemble the eight original construct categories (so they would be subtle).<br />
The fact that some of the judges mistakenly sorted the Unlikely Virtues items into the construct categories<br />
suggests that the items are indeed subtle. In general, however, there was good agreement among the<br />
researchers concerning the placement of PEC items on constructs.<br />
Pilot-Test Results<br />
Frequency counts revealed that the vast majority of the PEC items had an adequate spread of responses<br />
across the response alternatives. Only a few items had response distributions that were considered<br />
unacceptable (e.g., over 90 percent of the respondents chose the most desirable response alternative).<br />
However, for some items the response distributions were much better than for others. This information<br />
was taken into account in refining the PEC, particularly in making decisions concerning which items to<br />
drop.<br />
The item-level intercorrelations were factor analyzed, and rotated principal factor solutions containing<br />
from 2 to 12 factors were examined. Based on a parallel analysis (Montanelli & Humphreys, 1976),<br />
the amount of variance accounted for by each factor, and the interpretability of the solutions, the eight<br />
factor solution was selected for further consideration,<br />
The amount of overlap between the results of the retranslation exercise and the factor analysis was<br />
encouraging. Items from the Leadership Orientation scale defined a factor, and nearly all of the items<br />
that were retranslated into this scale had their highest loading on that factor (8 of 11). Similarly, Organization<br />
and Planning and NROTC/<strong>Military</strong> Interest and Motivation also defined their own factors.<br />
Achievement Motivation defined a factor, but most of the Responsibility (7 or 12) items also loaded on<br />
this factor. Dominance and Team Orientation each defined a factor, and the Sociability items were split<br />
between these two factors, with the Sociability items involving friendliness loading on the Team Orientation<br />
factor and those involving talkativeness and assertiveness loading on the Dominance factor. Clearly the<br />
retranslation and the factor analysis results converged on very similar sets of constructs.<br />
500
Comuarison with “Honest” Sample<br />
The applicant sample generally chose more desirable response options (i.e., response options that led<br />
to higher scores) than the NROTC sample. For most of the PEC items, the difference between the mean<br />
item-level scores for the two groups was quite small. However, for a few items the difference was large,<br />
especially when the “correct” response was fairly obvious. These latter items are probably the most<br />
susceptible to distortion, and this information was considered in deciding which items to drop.<br />
Refining the PEC<br />
The results of the factor analysis and the retranslation exercise were both taken into account in defining<br />
the final set of PEC constructs. Where the retranslation and the factor analysis suggested a slightly<br />
different set of constructs, rational considerations guided formation of the fmal constructs. For example,<br />
although the Responsibility items were grouped with the Achievement LMotivation items in the factor<br />
analysis, the literature review suggested that these two predictor constructs would be rela:ed to somewhat<br />
different criterion constructs. Therefore, Achievement Motivation and Responsibility were kept separate,<br />
Revisions were made to many of the PEC constructs based on the retranslation and the pilot test analyses,<br />
resulting in a final set of seven “revised” biodata/temperament constructs. These revised constructs are<br />
listed on the left side of Table 1,<br />
Achievement Motivation<br />
Dependability<br />
Social Comfort<br />
Dominance<br />
Leadership Orientation<br />
NROTC/<strong>Military</strong> Interest and Motivation<br />
Organization and Planning<br />
Unlikely Virtues<br />
Miscellaneous<br />
Table 1<br />
Descriptive Statistics for the Final (Shortened) PEC Scales<br />
# of Items Mean<br />
.80<br />
1.29<br />
1.02<br />
.87<br />
51<br />
.68<br />
.58<br />
2<br />
.49<br />
.43<br />
.47<br />
A4<br />
.68<br />
.56<br />
.56<br />
23 iii<br />
Standard<br />
Deviation Reliability1<br />
NOIS. NC range from 962 to 964 for means and standard deviations; from 898 to 953 for the reliabilities. (Computation of coefficient alpha<br />
required complete data.)<br />
l Coefficient Alpha<br />
2 Descriptive statistics an not presented for the fiial Unlikely Vhues scale because it contains new items.<br />
3 The Miscellaneous category is not a scale, so descriptive statistics are not appropriate.<br />
501<br />
.82<br />
.59<br />
.73<br />
.82<br />
.78<br />
.73<br />
.80<br />
n/a<br />
n/a
After the new construct structure was delineated, each PEC item was assigned to a construct/scale<br />
according to its factor loadings and item content. Items that did not fit well into any construct were<br />
placed in a “Miscellaneous” category. Item-total correlations were then computed for these revised construct<br />
scales, and these were used to help guide decisions concerning which items to retain m the final<br />
(shortened) PEC.<br />
The final step in the present research was to shorten and refine the PEC. Decisions concerning<br />
which items to drop took into account the pilot test results, the comparison sample results, the retranslation<br />
results, and the item content. A few promising items were retained that did not fit welI into any of<br />
the predictor constructs and placed in a “Miscellaneous” category. A total of thirty-two content scale<br />
items were dropped. In addition, several of the Unlikely Virtues scale items were revised or replaced<br />
based on results from the comparison and pilot sample data analyses.<br />
The final (shortened) version of the PEC contains 116 items distributed across the final construct<br />
scales as shown in Table 1, Table 1 also presents descriptive statistics for these scales. All of the scales<br />
except Dependability have very good internal consistency reliability. The internal consistency of the<br />
Dependability scale is comparatively low, and the mean score on this scale is also quite high in the applicant<br />
sample. This scale was retained for further study in spite of these problems because, based on the<br />
literature review, it is expected to predict attrition. Descriptive statistics are not reported on Table 1 for<br />
the final Unlikely Virtues scale, because this scale contains new and revised items. Table 2 presents the<br />
intercorn3lations among these final construct scale scores.<br />
Achievement Motivation (AM)<br />
Table 2<br />
Scale Intercorrelations for the Final (Shortened) PEC Scales<br />
Dependability (DP) .56<br />
Social Comfort (SC) .32 .I8<br />
Dominance (DM) .42 .29 .47<br />
AM DP SC DM LO NR OP<br />
Leadership Orientation (LO) A4 .28 .42 .56<br />
NROTC/<strong>Military</strong> Interest and Motivation (NR) .42 .35 .25 .33 .31<br />
Organization and Planning (OP) -58 -42 .19 .27 .31 .34<br />
Unlikely Virtues (XIV) l .45 .30 .25 .33 .25 .35 .33<br />
NOW Ns range from 962 to 964.<br />
l Sum of 7 idned items.<br />
502
Conclusions<br />
The final experimental PEC measures seven biodata/temperament con~tn~ct~. Each of the construct<br />
scales seems to be reasonably homogeneous and focused on the intended personal characteristics, experiences,<br />
and motivation constructs. The inventory has good potential for enhancing the prediction of<br />
NROTC performance and attrition, Further research is needed to evaluate validity of the PEC for predicting<br />
perfonance and attrition.<br />
REFERENCES<br />
Doll, R. E. (1971). Item susceptibility to attempted fakiig as related to item characteristics and adopted<br />
fake set. Journal of Psychology, 77,9-16.<br />
Montanelli, R. G., Jr., & Humphreys, L. G. (1976). Latent roots of random data correlation matrices<br />
with squared multiple correlations on the diagonal: A Monte Carlo study. Psychometrika. 41,341-<br />
348.<br />
Neumann, I., Githens, W. H., & Abraham& N. M. (1967). Development and evaluation of an o&?&r<br />
potential composite (NPRDC TR 98- 18). San Diego: Navy Personnel Research and Development<br />
Center.<br />
Owens-Kuxtz, C. K,, Gialluca, K. A., & Borman, W. C. (1989). Exumination of the attrition coding systern<br />
and development of potential attrition predictors for the Navy Reserve Oflcer Training Corps<br />
(NROTC)program (Institute Report No. 179). Minneapolis: Personnel Dectstons Research Institute.<br />
Rlland, B. (1957). The development of a fake-resistant testfor selecting career-motivoted NROTC<br />
scholarship recipients (PRFASD Report No. 112). San Diego: U.S. Naval Personnel Research<br />
Field Activity.<br />
503
803<br />
PSYCHOLOGICAL APPLICATIONS TO ENSURING PERSONNEL SECURITY:<br />
A SYMPOSIUM<br />
BORMAN, W., University of South Florida and Personnel Decisions Research<br />
Institutes, Inc.;<br />
BOSSHARDT, M., DUBOIS, D., and HOUSTON, J., Personnel Decisions Research<br />
Institutes, Inc., Mpls., MN;<br />
CRAWFORD, K., Defense Personnel Security Research and Education Center,<br />
Monterey, CA;<br />
WISKOFF, M., and ZIMMERMAN, R., BDM <strong>International</strong>, Inc., Monterey, CA;<br />
SHERMAN, F., Marine Security Guard Battalion, Quantico, VA.<br />
The national security and financial consequences of unsuitable conduct<br />
and compromise of classified information by persons in sensitive or<br />
high security risk jobs are enormous. To protect against these types<br />
of unreliable behavior, a,.personnel security program is utilized by<br />
the Department of Defense. This program has two major emphases. The<br />
first involves screening individuals who are being considered for<br />
initial clearances. The second emphasis is the ongoing or continuing<br />
assessment of cleared personnel. With respect to initial screening,<br />
investigative interview procedures and background questionnaires to<br />
screen applicants are discussed. Regarding continuing assessment,<br />
current military service programs and an approach to assessing Marine<br />
Security Guard behavior are described. The symposium concludes with a<br />
discussion of some of the difficulties encountered by practitioners<br />
and new approaches to improve personnel security practices.<br />
504
THE INVESTIGATIVE INTERVIEW><br />
A REVIEW OF PRACTICE AND RESEARCiI<br />
David A. DuBois and Michael J. Bosshardt<br />
Personnel Decisions Research Institutes, Inc.<br />
Martin F. Wiskoff<br />
BDM <strong>International</strong>, Inc.<br />
Introduction<br />
An important element in safeguarding national security is maintaining personnel security.<br />
Each year thousands of individuals are assigned to jobs that provide them with access to<br />
extremely sensitive or classified information that could adversely affect national security. The<br />
Background Investigation (BI) is the primary method for screening personnel for positions<br />
requiring a Top Secret clearance. This method relies principally on obtaining information from<br />
an interview with the subject. Supplementary information is gathered through self-report<br />
background questionnaires, loca1 agency checks, national agency checks, credit checks, and<br />
interviews with character and employment references. This information is evaluated against<br />
various administrative criteria by adjudicators.<br />
Objectives<br />
The objectives of this paper are to (1) describe the investigative interview, (2) review<br />
research related to the investigative interview, and (3) identify directions for future research in<br />
this area.<br />
Research Approach<br />
A literature review was conducted to identify empirical and descriptive studies related to<br />
the investigative interview. Specifically, computerized and manual literature searches were<br />
performed, as well as a telephone survey of experts from academia, industry, professional<br />
associations, and integrity test publishing companies. In addition, detailed information on<br />
investigative interview practices within the federal government was obtained through site visit<br />
interviews with 10 senior officials at five federal agencies.<br />
What is the investigative interview?<br />
The investigative interview is a method used for gathering information to determine the<br />
reliability of individuals for working in positions of trust or positions that provide access to<br />
extremely sensitive or classified information. Most investigative interviews are conducted by<br />
,organizations within the military services, federal government, and defense industry. These<br />
interviews can involve either the subject or persons who know the subject (e.g., references, past<br />
employers). They are typically conducted by interviewers who are trained in interviewing<br />
505<br />
.
-- -------<br />
methods and nonverbal communication techniques (e.g., kinesics, proxemics). The interview,<br />
which may be conducted using a variety of formats (e.g., structured, semi-structured,<br />
unstructured), typically covers topics such as honesty, substance abuse, emotional stability, and<br />
financial irresponsibility. This interview information is then summarized in narrative or rating<br />
form, and combined with other information about the subject (e.g., from self-report background<br />
questionnaires, local and national agency checks, credit checks). Senior adjudicators then make<br />
final screening decisions.<br />
Current Investigative Interview Practice<br />
Ten senior officials from five government organizations [Defense Investigative Service<br />
(DIS), the Office of Personnel Management (OPM), the Federal Bureau of Investigation (FBI),<br />
the Central Intelligence Agency (CIA), and the Defense Intelligence Agency (DIA)] were<br />
interviewed to obtain detailed information regarding current investigative interviewing practice<br />
for individuals being considered for Top Secret personnel security clearances. Each interview<br />
lasted 1 to 3 hours. A composite description of the major features of both subject and nonsubject<br />
investigative interviews is presented below.<br />
Preparation<br />
Overall, the interview procedures followed by these agencies are remarkably similar in<br />
many respects. The interviewer generally prepares for the interview by reviewing available<br />
background information about the subject for missing, discrepant, and issue-oriented<br />
information. From this background information, specific interview questions are developed.<br />
Setting<br />
Subject and non-subject interviews are often conducted in different settings. Subject<br />
interviews are usually conducted in a government office setting, whereas non-subject interviews<br />
are less likely to be held in an office. In both types of interviews, privacy and freedom from<br />
distractions are the principaI requirements for the interview setting.<br />
Conduct<br />
Guidelines for interviewer conduct are similar across agencies. These guidelines include<br />
acting in a professional mann,er, dressing in a businesslike manner, and being courteous,<br />
respectful, and non-judgmental.<br />
Format<br />
The investigative interview is conducted in four phases: introduction, background form<br />
review, issue development, and conclusion. Each phase is different in content and tone. The<br />
entire subject interview typically takes from one half to one hour in length.<br />
Introduction. During the introduction, the interviewer usually (although not always)<br />
shows credentials and positively identifies the subject. In this phase, the interviewer develops<br />
506
7B---.-- --<br />
rapport with the subject, explains the interview purpose and format, and secures a verbal<br />
commitment from the subject to provide truthful and complete information.<br />
At some point during the subject interview, the interviewer informs the subject of the<br />
privacy act. This may be done at the beginning of the interview (e.g., DIS, OPM) or near the<br />
end of the interview (e.g., FBI, CIA). OPM subject interviews are conducted under oath. None<br />
of the other agency officials mentioned use of an oath, although DIS interviewers seek written<br />
signed statements when the subject provides significant derogatory information.<br />
Background Review. Following the introduction to the subject interview, the interviewer<br />
generally reviews the subject’s background history form. During this phase of the interview, the<br />
interviewer questions the subject about specific items on the form, emphasizing items that ‘have<br />
been identified as omitted or discrepant during the preparation phase. A review of each item on<br />
the form is generally not undertaken.<br />
Issue Development. In the issue development phase, the interviewer systematically<br />
questions the subject on a range of topics, In most agencies, a standard list of topics is covered.<br />
These topics, which are similar across agencies, include education, employment, residence,<br />
alcohol, drugs, mental treatment, moral behavior, family and associates, foreign connections,<br />
foreign travel, financial responsibi!ity, organizations, loyalty, criminal history, handling<br />
information, and trust. Coverage of interview topics generally begins with questions on the<br />
subject’s background (e.g., education, employment) and later proceeds into the more sensitive<br />
areas.<br />
Conclusion. The concluding phase of the interview is focused on answering any<br />
concerns of the interviewee. The next steps of the security clearance process are also explained<br />
at this time.<br />
Interview Procedures<br />
A variety of techniques are used to facilitate the investigative interview process.<br />
Interviewers are typically trained in four general categories of interviewing skills: motivation,<br />
questioning, observation, and listening.<br />
Motivation. Subjects are motivated in disclosing sensitive information to the interviewer<br />
in several ways. The interviewer ensures that the interviewee understands the purpose, format,<br />
and content of the interview. The “whole person” concept of adjudication is explained so that<br />
the interviewee understands that negative information is judged in terms of the circumstances of<br />
the situation, how long ago it happened, etc., and in terms of the positive qualities of the person.<br />
The interviewee is informed of the consequences of omitting or providing misleading<br />
information. The interviewer typically secures a verbal commitment to provide complete and<br />
truthful information.<br />
Rapport is maintained by displaying a non-judgmental attitude, fairness, and respect.<br />
Objections are managed by clearly identifying the nature of the objection or hesitation; re-stating<br />
507
it the interviewee, and addressing concerns directly.<br />
Ouestioninq. Several questioning approaches are used in conducting subject interviews.<br />
Although the topic areas are generally structured, only DIS emphasizes use of a structured set of<br />
questions for each topic. DIS interviewers typically ask four to seven short, direct questions<br />
regarding a subject area, followed by summarizing questions. Other agencies use more openended<br />
questions, followed with summary or verification questions. Interrogative questioning<br />
methods are not generally used by DIS, but are occasionally used by FBI interviewers.<br />
Observation. All of the agencies visited train their interviewers to look for possible<br />
verbal and nonverbal cues to deception on the part of the interviewee. Most of these indicators<br />
are based on noticing patterns of various verbal, paralinguistic, and nonverbal (body gestures,<br />
facial expression) indicators. When possible deception is detected, the interviewer may remind<br />
the subject of the importance of honesty, and that confidentiality is maintained.<br />
Listening. Interviewers are trained to listen to the whole response, to use active listening<br />
procedures, and to follow-up vague responses with questions that draw out details. Techniques<br />
such as re-statement and paraphrasing am used to encourage elaboration.<br />
Documentation<br />
Investigators normally take only limited (or no) notes during the interview. OPM<br />
interviewers tend to take the most extensive notes, while FBI interviewers generally take fewer<br />
notes. Upon completion of the interview, interviewers write or dictate a short report<br />
summarizing the results of the interview.<br />
Decision-Making<br />
In all agencies, interviewers obtain the interview information but adjudicators make the<br />
clearance decisions. OPM is unique in that it conducts interviews on a contract basis for over 90<br />
Federal agencies.<br />
Empirical Research<br />
No published empirical studies were found regarding the use of investigative or integrity<br />
interviews. The literature search did identify five unpublished studies, most of which were pilot<br />
studies.<br />
The most relevant of these compared the relative effectiveness of two types of<br />
background investigations--one with a subject interview and one without. Conducted by the<br />
Defense Investigative Service (Office of Personnel Investigations, 1986) the study involved a<br />
random sample of 47 1 military members, contractor employees, and DOD civilian personnel.<br />
For the 186 cases in which significant adverse information was identified, the background<br />
investigation which included the investigative interview developed significant information in<br />
164 of these cases. Furthermore, the procedure which included the investigative interview<br />
yielded 72 cases not identified by the traditional procedure. Based on these results, the research<br />
508
staff concluded that inclusion of the subject interview resulted in a significant improvement in<br />
the background investigation procedure.<br />
A survey by the Director of the Central Intelligence (Office of Personnel Investigations,<br />
1986) of 12 government agencies examined the productivity of various sources for the purposes<br />
of applicant screening and security clearances. Background investigation sources included in<br />
this study were subject interviews, neighbor interviews, education and employment record<br />
checks, national agency checks, and the polygraph. The results of the study suggested that the<br />
subject interview was the second most productive source for identifying serious adverse<br />
information.<br />
Flyer (1986) summarized much of the early personnel security screening literature<br />
conducted in the military. Although no data were presented, he noted that the most important<br />
finding of Air Force research on personnel security screening was “the unique and considerable<br />
value of the subject interview.<br />
In summary, the limited research on investigative interviews suggests that they may be<br />
useful personnel security screening devices.<br />
Related Research<br />
Although the research on investigative interviews is scarce, there is a wealth of research<br />
on interviewing in other contexts (e.g., employment, survey research). This research is useful to<br />
the extent that it suggests additional techniques to apply in the investigative interview setting or<br />
provides a theoretical model that explains interviewing behavior.<br />
For example, with respect to question characteristics, research examining eyewitness<br />
testimony (Lipton, 1977) compared the relative effectiveness of open-ended vs. close-ended<br />
questions. The results indicated that narrative, open-ended formats tend to produce very<br />
accurate, but incomplete information, Close-ended, interrogatory formats, on the other hand,<br />
tend to produce more complete, but less accurate information. This led one researcher (Loftus,<br />
1982) to suggest that open-ended questions be used first, followed by specific (close-ended)<br />
questions to ensure that complete information is obtained.<br />
The decades of research on employment interviews and the more recent research on the<br />
detection of deception provide a rich source of ideas for improving investigative interviewing<br />
procedures. Many of these ideas have been recently summarized in a review of investigative<br />
interviewing and related research (Bosshardt, DuBois, Carter, & Paullin, 1989).<br />
While these large scientific literatures on related interviewing techniques can provide<br />
many ideas, there is a strong need to thoroughly investigate the utility of these ideas in the<br />
investigative interview setting before adopting them in practice. A careful consideration of the<br />
very different contexts that exist between the investigative interview and other interview settings<br />
suggests that results may not generalize, or that the effects may not be the same.<br />
For example, the purpose of the investigative interview is to screen out people, while the<br />
509
purpose of the employment interview is to select in personnel. The rejection rate for<br />
investigative interviews is about 1% to 5%, while the selection ratio for employment interviews<br />
is typically about 20% to 60%. The focus of investigative interviews is on behavioral constructs<br />
such as behavior unreliability and unsuitability, while employment interviews focus on cognitive<br />
ability, motivation, and communication skills. Perhaps most importantly, the motivational<br />
approach used is very different for these two settings. The consequences of providing good<br />
information in an investigative interview is the avoidance of punishment and the reward of a job<br />
for the employment interviewee.<br />
Needed Research<br />
Although the interview has been extensively studied as a method of gathering<br />
information, little research is available regarding its use in the investigative interview setting. A<br />
variety of investigative interviewing procedures are currently in use and the large literature on<br />
other interview settings suggest additional procedures to consider. Research is needed to<br />
systematically evaluate the effectiveness of these various investigative interview methods.<br />
One major finding from employment interview research can probably be generalized to<br />
the investigative interview--the most impressive gains in interview validity result from the<br />
systematic study of the performance criteria that is to be predicted. Research that defines the<br />
psychological dimensions and behavioral detail of security relevant performance can contribute<br />
to significant improvements improving interviewer training, assessing personnel security risks,<br />
and predicting unreliable behavior.<br />
REFERENCES<br />
Bosshardt, M.J., DuBois, D.A., Carter, G.W., & Paullin, C. (1989). The investigative interview:<br />
A review of practice and related research (Technical report No. 160). Minneapolis, MN:<br />
Personnel Decisions Research Institute.<br />
Flyer, ES. (1986). Personnel securitv research: Prescreening and background investigations<br />
(Report No. HUMMRRO-FR-86-01). Alexandria, VA: HUMRRO <strong>International</strong>, Inc.<br />
Lipton, J.P. (1977). On the psychology of eyewitness testimony. Journal of Applied<br />
Psvchologv, a(l), 90-95.<br />
Loftus, E. (1982). Interrogating witnesses--good questions and bad. In R. M. Hogarth (Ed.),<br />
Guestion framing and response consistency. San Francisco: Jossey-Rass.<br />
Office of Personnel Investigations. (1986). Subiect interview study: Phase I report.<br />
Washington, D.C.: U.S. Office of Personnel Management.<br />
510
Backaround<br />
Each of the military services prescreens enlisted<br />
applicants for sensitive occupations, i.e., those that<br />
require a Top Secret clearance, access to Sensitive,<br />
Companmented Information, or are included in the<br />
Nuclear Weapons Personnel Reliability Program. The<br />
prescreening occurs prior to the initiation of the<br />
Personnel Security Investigation (PSI) and is designed<br />
to: (a) reduce the probability of assigning unreliable<br />
individuals to sensitive positions and (b) cull out<br />
individuals who are likely to be denied a security<br />
clearance. Crawford and Wiskoff (1988) in their review<br />
of the prescreening procedures used by the military<br />
services, found that they had been developed without<br />
empirical assessment of their validity and utility. As an<br />
example, they pointed out that despite intensive<br />
prescreening, the discharge rate from military service<br />
for reasons of unsuitability was not much lower for<br />
high-security occupations than that for other military<br />
jobs.<br />
The security interview at the <strong>Military</strong> Entrance<br />
Processing Station (MEPS) is the first step in the<br />
prescreening process for enlisted Army applicants to a<br />
sensitive job. Prior to the interview, applicants<br />
complete the Army Security Screening Questionnaire<br />
(DAPC-EPMD FORM 169-R). Responses to the<br />
questionnaire are examined by a security interviewer<br />
and explored further during an interview with the<br />
applicant. For those applicants who are accepted into<br />
a sensitive job and placed into the Delayed Entry<br />
Program (DEP), a second 169-R is completed and an<br />
interview conducted upon completion of the DEP.<br />
Puroose<br />
This purpose of this investigation was to explore<br />
the effectiveness of the 169-R as a security<br />
prescreening instrument, in terms of: (a) the degree to<br />
which it is able to predict two operational screening<br />
decisions and a measure of personnel reliability and<br />
(b) the utility or impact of using the infoimation it<br />
provides, along with other applicant data. The study<br />
was preliminary in that only a small sample was<br />
.analyzed to determine whether it would be fruitful to<br />
conduct a large scale study. A more complete<br />
discussion of the study and the results is available in<br />
Zimmerman, Fitz, Wiskoff, and Parker (in press).<br />
UTILIW OF A SCREENING QUESTIONNAIRE FOR<br />
SENSIT-IVE MILITARY OCCUPATIONS<br />
Ray A. Zimmerman<br />
Martin F. Wiskoff<br />
BDM international, Inc.<br />
511<br />
Sample<br />
Army Security Screening Questionnaires filled out<br />
by applicants from 1981 through 1986 were collected<br />
from MEPS throughout the country. Only the<br />
questionnaires completed during 1984 were used<br />
because: (a) the questionnaire had been revised<br />
several times during the years prior to 1984 and<br />
(b) individuals completing questionnaires after 1984<br />
would not have had the opportunity to finish their first<br />
term of service. Questionnaires were available for<br />
2,870 applicants. From these a random sample of 281<br />
non-prior service males was drawn. Analyses<br />
indicated that the sample appears to match the<br />
population of 1984 applicants to high security<br />
occupations fairly well in terms of Armed Forces<br />
Qualification Test (AFQT) scores and demographic<br />
variables such as race, age at service entry and level<br />
of education.<br />
Predictor Measures<br />
The Army 169-R administered in 1984 consists of<br />
a series of 45 questions which can be answered “yes”<br />
or “no,” relating to: (a) Prior <strong>Military</strong> and Federal<br />
Service, (b) Foreign Connections, (c) Drug Use,<br />
(d) Alcohol Use, (e) Emotional Stability, (9 Sexual<br />
Misconduct, (g) Financial Problems, (h) Employment<br />
Problems, (i) Delinquency, and (i) Legal Offenses. For<br />
each affirmative response, the applicant must provide<br />
details of the specific incidents or experiences. !n<br />
addition, applicants must supply detailed information<br />
about current financial obligations and any previous<br />
arrests, citations, or other types of contact with the<br />
legal system. Most applicants can complete the 169-R<br />
in approximately one-half hour.<br />
For this study, two classes of predictors were<br />
taken from the 169-R: (a) yes/no items and<br />
(b) detailed information that was transformed into<br />
coded items. There were 50 coded items analyzed as<br />
predictors.<br />
Other applicant data that are available at the time<br />
of the security interview were examined in conjunction<br />
with 169-R responses. These additional predictors<br />
included AFQT category, age at entry into the Army<br />
and level of education. The data were obtained frcm<br />
personnel records available at the Defense Manpower<br />
Data Center (DMDC).
Criterion Measures<br />
Crawford and Trent (1987) note that in personnel<br />
security research, the focus is on whether an individual<br />
demonstrates reliability, trustworthiness, good judgment<br />
and loyalty in the actual handling and us8 of Classified<br />
information. Failure of the individual could be<br />
manifested at one level in excessive security violations<br />
and at the extreme in the deliberate compromis8 of<br />
classified information, including espionage.<br />
Fortunately, compromise and espionage exhibit<br />
a very low base rate. Security violations, while more<br />
frequent, also show a low base rate, and in addition<br />
information on commission of violations is not available<br />
in centralized data baS8.S.<br />
Three alternative criteria were used in this study:<br />
1. Prescreening adjudication decision. This<br />
decision is made at the MEPS after the applicant has<br />
completed the 169-R and the security interview. The<br />
security interviewer, after consultation with security<br />
personnel within his/hsr chain of command, determines<br />
whether the applicant should be allowed to continue<br />
processing for a sensitive occupation. Many of the<br />
rejected applicants enter the Army in non-sensitive<br />
occupations and receive lower level security<br />
clearances. Historically, approximately 33 to 47% of<br />
applicants are rejected at this stage of processing.<br />
2. Issue Case status. If derogatory information<br />
is discovered during the course of the PSI, the<br />
investigation is expanded and designated as an “issue<br />
case.” This designation indicates, in most instances,<br />
that there is some evidence of a blemish in an<br />
individual’s behavior, associations, etc. that may be a<br />
cause to question his/her qualifications to hand18<br />
classified material. Issue case status has been<br />
employed as an operational criterion in previous<br />
studies (Crawford and Trent, 1987; Wiskoff and<br />
Dunipace, 1988). Data Concerning iSSue case Status<br />
W8r8 obtained from the Defense Central Index Of<br />
Investigations (DCII), a copy of which is maintained for<br />
research purposes at DMDC.<br />
3. Type of discharge. This variable refers to<br />
whether or not the individual was discharged from the<br />
Army for reasons of unsuitability. Unsuitability attrition<br />
is operationally defined as those acc8ssions listed on<br />
the DMDC Cohort File having inter-service separation<br />
codes 60-87 for failure to meet minimum behavioral or<br />
performance standards. Type of discharge has been<br />
used in many studies of military service attrition.<br />
Analvses<br />
Only the data from the S8COnd administration of<br />
the 169-R were used for individuals who had<br />
completed the form twice, i.e. entering and leaving<br />
DEP. This was necessary, because for these<br />
individuals, the final prescreening adjudication measure<br />
represents a decision that is based on information from<br />
the second set of responses.<br />
512<br />
The first set of analyses focused on the validity of<br />
the instrument. First, a series of correlational analyses<br />
was conducted to examine the relationship between<br />
each of the yes/no and coded items and the criierion<br />
measures. Next, empirical scoring keys for each of the<br />
criieria were developed using the horizontal percent<br />
method (Guion, 1965). The total score for each key<br />
was subsequently Correlated with each criterion<br />
measure. In addition, AFQT category and age at entry<br />
into th8 Army were examined for their inCrem8ntal<br />
validity in predicting issue case Status and type of<br />
discharge. Level of education could not be used<br />
because there were too few individuals who did not<br />
have a high school diploma.<br />
The second set of analyses examined the utility of<br />
decisions based on cutoff scores for the empirical<br />
scoring keys. Utility was assessed by examining the<br />
percentage of individuals that can be identified and<br />
Screened out using the empirical scoring keys and their<br />
associated cutoff scores, for different combinations of<br />
AFQT and age at entry categories.<br />
Results<br />
It is important, in examining the findings of this<br />
study, to note that the data for the three criterion<br />
measures do not represent the progression of a single<br />
cohort through th8 screening process. That is, each<br />
applicant’s predictor data were matched to his/her<br />
criterion data without regard for how the person fared<br />
on the other criteria. For instance, it is possible for an<br />
applicant to have been screened out of a sensitive job<br />
during th8 prescreening adjudication and still have<br />
criierion data on type of discharge, as long as the<br />
person did enlist in the Army in a non-sensitive<br />
occupation.<br />
In reviewing the relationships of the individual<br />
items to the criteria, it should be remembered that<br />
some types of negative behavior are r8latiV8ly rare or<br />
are not often admitted. This low baS8 rat8 for an item<br />
serves to restrict the variance of the variable and<br />
attenuate its correlation with the criterion. Overall, 11<br />
items showed statistically significant relationships with<br />
prescreening adjudication, thr88 with issue case status,<br />
and only one with type of discharge. Drug use and<br />
financial problems were the two content areas with the<br />
most significant relationships.<br />
The validity coefficients for the empirical scoring<br />
keys and the regression models (including the<br />
empirical keys and additional applicant data) are<br />
displayed in Table 1. Each key shows a significant<br />
correlation with th8 criterion it was designed to predict.<br />
Both the prescreening adjudication key and the issue<br />
case status key had fairly strong correlations with<br />
prescreening adjudication and issue case status. Only<br />
the type of discharge scoring key was significantly<br />
correlated with all three criteria, although the r’s only<br />
ranged from .12 to .15.
Empirical Scorinq Key<br />
Prescreening Adjudication<br />
Issue Case Status<br />
Type of Discharge<br />
Reqression Model<br />
ISSUe Case Status key, AFQT, and Age<br />
Type of Discharge key, AFOT, and Age<br />
� p c .05 � * p < .Ol<br />
Regression analyses were performed to examine<br />
the incremental validity of the additional applicant data.<br />
AFQT was collapsed into high (I-IIIA) and low (IIIB or<br />
below) categories, Age was collapsed into three<br />
categories: (a) 17 year olds, (b) 18-20 year olds, and<br />
(c) 21 year olds or older. In Table 1 it is seen that<br />
there is no evidence of incremental validity in predicting<br />
issue case status by including AFQT category or age<br />
at entry. However, for type of discharge, the validity<br />
coefficient increases from .15 to 22 with the addition of<br />
these variables.<br />
169-R Item<br />
Table 1<br />
Validity Coefficients for Empirical Scoring Keys<br />
and Regression Models<br />
Prescreening<br />
Adiudication<br />
a**<br />
.25**<br />
.12*<br />
Issue Case<br />
Status<br />
.21**<br />
.27**<br />
.15*<br />
.27**<br />
Type of<br />
Discharqe<br />
-.02<br />
-.02<br />
.15*<br />
.22**<br />
Figure 1 displays the 169-R items that are<br />
included within the scoring keys for each of the criteria.<br />
Four of the items, i.e. times marijuana use, times<br />
intoxicated, visits for nervous, emotional, mental<br />
counseling and suspended/expelled from School<br />
appear in all three scoring keys. Three other items are<br />
in two of the criteria while the remaining nine are only<br />
found in one of the keys.<br />
Prescreening Issue Case Type of<br />
. . .<br />
l&&cxl Status<br />
Times marijuana use J J J<br />
Frequency of marijuana use J J<br />
Used hard drugs J<br />
Possessed, transported, grown, produced, etc., drugs J J<br />
Transpotied, sold, etc., alcohol J<br />
Times intoxicated J J J<br />
Frequency of alcohol usage J<br />
Visits for nervous, emotional, mental counseling J / J<br />
Pregnant or caused pregnancy J<br />
Written bad checks J<br />
Made delinquent payments J<br />
Experienced financial problems J<br />
Left job under less than favorable conditions J J<br />
Suspended/expelled from school J J J<br />
Unsafe vehicle/licensing violations /<br />
Ran awav or considered runnina from home J<br />
Figure 1. Form 169-R items included in empirical scoring keys<br />
513
The final analysis looked at the utility of the<br />
scoring keys as defined as a reduced risk of:<br />
(a) having a security clearance denied to an individual<br />
who has been assigned to a sensitive duty position<br />
and (b) assigning unreliable individuals to sensitive<br />
duty positions. The utility was evaluated by first<br />
establishing cutoff scores and then determining what<br />
the impact would have been if the empirical keys and<br />
cutoff scores had been used in prescreening.<br />
The goal, in setting the cutoff scores, was to<br />
screen out individuals with low scores on the empirical<br />
keys and yet fulfill existing manpower requirements. In<br />
this sample, 19% of the non-prior service male<br />
applicants were rejected in the prescreening<br />
adjudication phase. Thus, cutoff scores were<br />
established for the three scoring keys at the point<br />
closest to the 19%/81% split. 2<br />
Empirical<br />
Scorinq Key<br />
Prescreening<br />
Adjudication<br />
score<br />
Below<br />
cutoff<br />
Issue Case Status Below<br />
cutoff<br />
Type of Discharge<br />
Regression model<br />
with Type of<br />
Discharge<br />
Below<br />
cutoff<br />
Above<br />
cutoff<br />
Below<br />
cutoff<br />
Above<br />
cutoff<br />
Table2<br />
impact of Using Cutoff Scores on the Issue Case<br />
and Unsuitability Dkharge Rates<br />
Table 2 shows the impact of using the three keys<br />
in terms of reducing the issue case and unsuitability<br />
discharge rates. The base rate for issue cases in this<br />
sample was 8.0% The percentages of issue cases<br />
above the cutoff was lower than the base rate for all<br />
three keys, with the issue case status key showing the<br />
lowest percentage (5.3%). Thus, the issue case rate<br />
could be reduced by approximately three percentage<br />
points by using this key. Analysis of DCII data<br />
revealed that 289 of the non-prior service males who<br />
entered high security occupations in the Army in 1984<br />
became classified as issue cases. Thus, approximately<br />
98 of these individuals would not have been allowed<br />
into high security occupations if the issue case status<br />
scoring key and its cutoff had been used for<br />
prescreening.<br />
Issue Case Status Tvoe of Discharqe<br />
Percent Percent Percent Percent<br />
w No Issue Unsuitable Normal<br />
25.9 74.1 11.5 88.5<br />
5.6 94.4 14.0 86.0<br />
22.2 77.8 9.4 90.6<br />
5.3 94.7 14.5 85.5<br />
16.1 83.9 24.4 75.6<br />
6.7 93.3 11.7 88.3<br />
28.0 72.0<br />
10.4 89.6<br />
Base Rate 8.0 92.0 13.5 86.5<br />
514
The base rate for applicants who received<br />
unsuitability discharges was 13.5%. Table 2 shows<br />
that the greatest reduction in this rate occurs with the<br />
use of the Type of Discharge key plus the<br />
supplementary predictors, i.e. AFQT category and age<br />
at service entry. At a cutoff score closest to the<br />
19%/81% split, the percentage of unsuitability<br />
discharges would have been reduced to 10.4%, slightly<br />
more than three percentage points below the base rate.<br />
This translates into 99 unreliable individuals who would<br />
have been screened out.<br />
Conclusions and Recommendations<br />
The major caveat in deriving operational<br />
conclusions from the findings of this study was the<br />
relatively small sample size. Other problems which are<br />
discussed in Zimmerman, et. al. (in press) are:<br />
(a) criterion issues such as the relevance of the criteria<br />
to personnel security decisions and the existence of<br />
false negatives (e.g., individuals classified as issue<br />
cases who are granted their securg clearances) and<br />
false positives (e.g., individuals who are never<br />
classified as an issue case yet they are turned down<br />
for a security clearance) and (b) impact of low base<br />
rates in both the predictors and criteria.<br />
Despite these caveats, further research on the<br />
169-R, using a large data sample, seems to be<br />
warranted for two reasons. First, the findings of this<br />
report clearly indicate the utility or benefit of using<br />
empirical scoring keys to supplement existing<br />
prescreening procedures based on the 169-R.<br />
Second, for many predictor variables from the 169-R,<br />
cell sizes were too small to compute a valid measure<br />
of association. If all available data for an entire year<br />
were analyzed, more definitive results could be<br />
obtained.<br />
In addition to analyzing a larger data sample, a<br />
potentially fruitful avenue is the revision of the 169-R to<br />
increase its validity.<br />
515<br />
Note: Since the completion of this study, research has<br />
been initiated on a much larger sample of 169-R forms<br />
completed by applicants in 1986. In addition, a<br />
revision of the 169-R has been developed jointly by the<br />
Defense Personnel Security Research and Education<br />
Center and the U. S. Total Army Personnel Command,<br />
and was operationally implemented on 1 October 1990.<br />
References<br />
Crawford, K. S. & Trent, T. (1987). Personnel security<br />
prescreenino: An application of the Armed<br />
Services Applicant Profile fASAP) (PERS-TR-87-<br />
003). Monterey, CA: Defense Personnel Security<br />
Research and Education Center.<br />
Crawford, K S., & Wiskoff, M. F. (1988). Screening<br />
enlisted accessions for sensitive military iobs<br />
(PERS-TR-89-001). Monterey, CA: Defense<br />
Personnel Security Research and Education Center.<br />
Guion, R. M. (1965) Personnel testing. New York:<br />
McGraw-Hill.<br />
Wiskoff, M. F. & Dunipace, N. E. (1988). Moral waivers<br />
and suitabilitv for hiah securitv militarv lobs (PERS-<br />
TR-88-011). Monterey, CA: Defense Personnel<br />
Security Research and Education Center.<br />
Zimmerman, R. A., Fitz, C. C., Wiskoff, M. F., & Parker,<br />
J. P. (in press). Preliminarv analvsis of the U. S.<br />
Armv Security Screening Questionnaire (PERS-TN-<br />
90-008). Monterey, CA: Defense Personnel<br />
Security Research and Education Center.
Continuing Assessment of Cleared Personnel in the <strong>Military</strong> Services<br />
Michael 3. Bosshardt<br />
David A. DuBois<br />
Personnel Decisions Research Institutes, Inc.<br />
Kent S. Crawford<br />
The Defense Personnel Security Research and Education Center<br />
Problem and Backqround<br />
Examination of recent espionage cases suggests that few spies enter<br />
government service with the intent to commit espionage. Instead, most<br />
individuals become spies as a result of personal and situational factors<br />
that occur after they receive a personnel security clearance. This<br />
suggests that an ongoing program of continuing assessment (CA) for cleared<br />
personnel should be an important component of the personnel security<br />
process.<br />
Two other factors underscore the importance of the CA program. First,<br />
initial clearance screening procedures tend to be costly, involve<br />
conditions of very low base rates, and have unknown validity. Second,<br />
hostile intelligence activities probably focus more effort on currently<br />
cleared personnel than on uncleared individuals.<br />
Despite its importance and the fact that formal CA programs have been in<br />
existence for a number of years, little is known about operational CA<br />
programs (DOD Security Review Commission, 1985). In order to address this<br />
deficiency, a project was initiated to evaluate how well CA programs are<br />
operating in the military services. The principal activities in this<br />
project included a review of regulations and literature related to CA<br />
(DuBois & Bosshardt, 1990), a survey of personnel at 60 Army, Air Force,<br />
Navy, and Marines Corps installations world-wide to obtain detailed<br />
information about CA programs (Bosshardt, DuBois, Crawford, & McGuire,<br />
1990; Bosshardt, DuBois, & Crawford, 1990a), and an analysis of systems<br />
issues related to CA (Bosshardt, DuBois, & Crawford, 1990b).<br />
Objectives<br />
The objectives of this paper are to (1) present some of the key findings of<br />
this survey of CA programs and (2) provide a preliminary assessment of the<br />
effectiveness of these programs.<br />
Approach<br />
The initial step in the study involved a review of regulations and<br />
literature related to CA. We then conducted a series of meetings with<br />
service branch headquarters and adjudication officials to gain a further<br />
understanding of CA policies and programs. Following this, nine military<br />
516
installations were visited to obtain an understanding of operational CA<br />
programs in the military and to gather information necessary for developing<br />
the survey research approach.<br />
These research activities led to the development of three preliminary<br />
survey forms. The principal form was a structured interview protocol for<br />
installation security office representatives. Two shorter survey forms<br />
were also developed for unit security managers and unit commanders.<br />
Preliminary versions of these forms were reviewed by several CA experts and<br />
pilot tested prior to actual survey administration.<br />
The survey forms were administered between September, 1989 and January,<br />
1990. The sample included 60 sites (21 Air Force, 19 Army, 18 Navy, and 2<br />
Marine Corps). Forty-eight were sites where individuals primarily had<br />
collateral access (i.e., top secret, secret, or confidential access) and 12<br />
were sites where individuals primarily had SC1 access; ten were overseas<br />
sites. Overall, completed survey forms were received from 60 installation<br />
security managers, 126 unit security managers, and 88 unit commanders.<br />
Results and Discussion<br />
The structured interview protocol for installation security managers<br />
included approximately 60 open-ended questions and numerous rating items.<br />
Two key issues concern the best sources of CA-relevant information and the<br />
most frequently reported types of CA information. Data concerning both<br />
issues are presented below.<br />
Sources of CA Information. Installation security managers were asked to<br />
rate the willingness of various groups to share derogatory information of<br />
security relevance with the security office. The results indicated that<br />
the military police, the clearance adjudication facility, and the<br />
investigations office are among the most willing to share information with<br />
the security office. Several types of installation personnel (e.g.,<br />
installation commanders, unit commanders, unit security managers) received<br />
moderate to high ratings. Most installation departments (e.g., medical,<br />
personnel, legal) and non-installation groups (e.g., local civilian police,<br />
federal agencies) were perceived as only moderately willing to share<br />
derogatory CA information. Employee assistance groups received relatively<br />
low ratings. Not surprisingly, coworkers and subjects were rated as least<br />
willing to share derogatory information.<br />
Tvoes of CA Information Reported. Installation security managers estimated<br />
the number of valid derogatory incidents reported to their security office<br />
during the past year for each of 12 types of information. The mean number<br />
of reported incidents (per 1000 cleared individuals) for various areas is<br />
shown in Table 1.<br />
' A complete summary of all results is presented in Bosshardt, OuBois,<br />
Crawford, and McGuire (1990).<br />
517
Table 1<br />
Mean Estimated Number of Valid Derogatory Incidents Reported to Collateral<br />
and SC1 Installation Security Offices During the Past Twelve Months<br />
(Per 1000 Cleared Individuals)<br />
Tvoe of Reoorted Incident<br />
Alcohol abuse 12.1<br />
Other incidents (e.g., non-judicial punishments) 9.5<br />
Drug abuse 6.6<br />
Criminal felony acts not covered in other categories 3.4<br />
Financial problems 3.1<br />
Court martials/desertions 3.1<br />
Falsification of information acts<br />
EmotionaJ/mental/family problems K<br />
Security violation incidents 2:1<br />
Sexual misconduct 1.6<br />
Foreign associations/travel incidents<br />
Disloyalty to the U.S. ::<br />
Collateral SC1<br />
Sites Sites<br />
Note. The samples include 43 collateral sites and 12 SC1 sites.<br />
The results in Table 1 suggest that alcohol abuse and other incidents<br />
(e.g., NJPs) are the most frequently reported areas at both collateral and<br />
SC1 sites. Overall, the average number of reported incidents across all<br />
incident categories (per 1000 cleared individuals) is 46.9 for collateral<br />
sites and 42.3 for SC1 sites.<br />
The CA survey yielded a considerable amount of quantitative and qualitative<br />
data. In addition to the interview data provided by installation security<br />
managers, four types of data were gathered: (1) ratings by installation<br />
security managers, unit security managers, and unit commanders of 136<br />
obstacles in maintaining an effective CA program, (2) write-in responses<br />
(n=684) by these three groups regarding the major CA problems, (3) ratings<br />
by installation security managers of 143 suggestions for improving CA, and<br />
(4) write-in suggestions (n = 636) by installation security managers, unit<br />
security managers, and unit commanders for improving CA.<br />
In order to have a common basis for comparing the quantitative and<br />
qualitative data and to facilitate the interpretation of the survey<br />
results, a taxonomy of CA problem/recommendation (or "finding") areas was<br />
developed. This taxonomy included eight general categories: (1) security<br />
education for cleared personnel; (2) training for security personnel; (3)<br />
derogatory information indicators, sources, and methods; (4) clearance<br />
adjudication procedures; (5) accountability for CA; (6) CA regulations; (7)<br />
CA emphasis; and (8) CA system considerations (e.g., legal issues, number<br />
of cleared personnel).<br />
518<br />
9.3<br />
9.8<br />
3.2<br />
:::<br />
:1<br />
2.8<br />
6.8<br />
2:<br />
.2
Obstacles in Maintainina an Effective CA Proaram. In general, analyses of<br />
the quantitative and qualitative survey data indicated that security<br />
education for cleared personnel, training of security personnel, and<br />
derogatory indicators, sources, and methods are the biggest obstacles in<br />
maintaining an effective CA program across the eight taxonomy areas. The<br />
clearance adjudication process, the emphasis on CA, and CA system<br />
considerations received moderately high rankings across the "CA obstacles"<br />
data sets. CA regulations and accountability for CA received the lowest<br />
overall rankings.<br />
Ratings of 136 specific obstacles to maintaining an effective CA program<br />
were provided by all survey respondents. The six most highly rated items<br />
by collateral site respondents (N=224) are listed below:<br />
- Reluctance of individuals to self-report derogatory information.<br />
- Too much time is taken by central adjudication facility to make<br />
clearance suspensioh/revocation decisions.<br />
- Lack of/inadequacy of training modules to instruct commanders and<br />
supervisors on how to on how to spot, interpret, and manage the<br />
early warning indicators of personnel security risks.<br />
- Reluctance of coworkers to report derogatory information.<br />
- Lack of standard training modules for unit commanders, supervisors,<br />
and cleared individuals which describe their continuing assessment<br />
responsibilities.<br />
- Delays in obtaining replacement personnel for individuals who lose<br />
security clearances.<br />
Recommendations for Imorovinq CA. Installation security managers rated 143<br />
suggestions for improving CA using a IO-point rating scale. Those items<br />
receiving the highest mean ratings are listed below:<br />
- Develop training modules to instruct commanders and supervisors on<br />
how to spot and manage the early warning indicators of personnel<br />
security risks and personnel problems.<br />
- Modify the regulations to direct other installation groups to<br />
provide more information to the security office.<br />
- Create a separate, full-time position for personnel security<br />
officers.<br />
- Improve continuing assessment training for supervisors.<br />
- Develop formal reporting procedures and written standards for the<br />
personnel, medical, legal, and other departments which define the<br />
types of information to be shared with the security office.<br />
- Increase/improve continuing assessment training for security<br />
managers.<br />
519
Effectiveness of CA. There is limited data for assessing the effectiveness<br />
of current CA programs. Findings from the survey indicated that (I)<br />
approximately 80 percent of the installations surveyed maintain some<br />
statistics relevant to CA (e.g., numbers and types of clearances, numbers<br />
of clearance suspensions and revocations, numbers of security violations,<br />
or numbers of reported derogatory incidents), (2) relatively few derogatory<br />
incidents are reported to the security office (see Table l), and (3) the<br />
number of clearance suspensions and revocations is very small. Table 2<br />
shows the number of clearance suspensions and revocations during the past<br />
12 months for sites in the survey sample.<br />
Table 2<br />
Approximate Types and Numbers of Clearances, Numbers of Clearance<br />
Suspensions, Numbers of Clearance Revocations for Survey Sites<br />
Confidential<br />
Clearances<br />
Secret<br />
Clearances<br />
Top Secret<br />
Clearances<br />
Top Secret<br />
Clearance with<br />
SC1 Access<br />
Mean Estimated<br />
Total Number<br />
(per site)<br />
295<br />
2847<br />
583<br />
678<br />
Mean Estimated Mean Estimated<br />
Number of Clearances Number of Clearances<br />
Suspended Per 1000 Revoked Per 1000<br />
During Past 12 Months During Past 12 Months<br />
(per site) (per site)<br />
0.8 0.3<br />
4.2 0.5<br />
1.2<br />
0.1<br />
2.4 1.8<br />
Notes. Estimates are based on information provided by installation<br />
security officers.<br />
The sample sizes for these analyses ranged from 48 to 54.<br />
Ratings of overall program effectiveness by installation security managers<br />
indicated that CA programs are moderately effective. The mean<br />
effectiveness ratings were quite similar across service branches, with the<br />
Air Force receiving the highest effectiveness among collateral sites and<br />
the Navy receiving the highest effectiveness rating among SC1 sites. The<br />
mean effectiveness ratings of SC1 and collateral programs were nearly<br />
identical within the Army and within the Air Force, but within the Navy SC1<br />
sites received much higher ratings than collateral sites.<br />
520
Installation security managers also rated the effectiveness of several<br />
aspects of the CA program. The results indicated that the clearance<br />
suspension/revocation process, sources of derogatory information, service<br />
branch regulations, indicators of security risk, reporting procedures, and<br />
security education are considered most effective. In contrast, the two<br />
lowest rated program aspects were performance appraisal information and<br />
incentives for reporting. The mean ratings were generally similar across<br />
service branches and for collateral and SC1 sites.<br />
In summary, little is known about the effectiveness of existing CA programs<br />
in the military services. The limited data suggests that these programs<br />
moderately effective, although they could be improved.<br />
Future Research<br />
Overall, the project resulted in 52 recommendations for improving CA<br />
programs (see Bosshardt, DuBois, Crawford, & McGuire, 1990; Bosshardt,<br />
DuBois, & Crawford, 1990b). The next step in this research program is to<br />
have personnel security experts from DOD, service branch headquarters,<br />
field installations, and the adjudication facilities prioritize these<br />
recommendations. Future research will focus on the highest priority items.<br />
References<br />
Bosshardt, M.J., DuBois, D., and Crawford, K. (1990a). Survev of<br />
continuinq assessment oroqrams in the militarv services:<br />
Recommendations. Monterey, CA: Defense Personnel Security Research and<br />
Education Center.<br />
Bosshardt, M.J., DuBois, D., and Crawford, K. (1990b). Survev of<br />
continuins assessment proqrams in the militarv services: Svstems issues,<br />
recommendations and program effectiveness. Monterey, CA: Defense<br />
Personnel Security Research and Education Center.<br />
Bosshardt, M.J., DuBois, D., Crawford, K., and McGuire, D. (1990). Survey<br />
of continuinq assessment oroqrams in the militarv services: Methodoloqv,<br />
analvses. and results. Monterey, CA: Defense Personnel Security<br />
Research and Education Center.<br />
DOD Security Review Commission, General Richard Stilwell (Chairman),<br />
(1985). Keepina the nation's secrets: A reoort to the Secretarv of<br />
Defense bv the Commission to Review DOD Securitv Policies and Practices,<br />
Washington, D.C.: Office of the Secretary of Defense.<br />
DuBois, D., Bosshardt, M. J., and Crawford, K. (1990). Continuinq<br />
assessment of cleared oersonnel in the militarv services: A conceptual<br />
analvsis and literature review. Monterey, CA: Defense Personnel<br />
Security Research and Education Center.<br />
521
Problem and Background<br />
A MEASURE OF BEHAVIORAL RELIABILITY<br />
FOR MARINE SECURITY GUARDS<br />
Janis S. Houston<br />
Personnel Decisions Research Institutes<br />
Martin F. Wiskoff<br />
BDM <strong>International</strong>, Inc.<br />
and<br />
Forrest Sherman<br />
Marine Security Guard Battalion<br />
The United States Marine Corps provides security guard services to meet the Department<br />
of State requirements at Foreign Service posts throughout the world. This use of Marines<br />
as security guards at Embassies, Legations, and Consulates was initiated in 1948 by a<br />
formal Memorandum of Understanding between the Department of State and the Secretary<br />
of the Navy. The primary mission of the Marine Security Guards is to protect the<br />
personnel, property, and classified and administratively controlled material and equipment<br />
within these premises.<br />
There are approximately 1300 Marine Security Guards (MSGs) currently serving at 140<br />
foreign posts in over 100 countries. These detachments range in size from five to thirtyeight<br />
Marines, and each is commanded by a senior non-commissioned officer, referred to<br />
as the “Detachment Commander”.<br />
The work described here was the fourth phase of a research effort undertaken jointly by<br />
the Marine Security Guard Battalion and the Defense Personnel Security Research and<br />
Education Center. Prior phases of this effort focused on improving the procedures used<br />
for pre-screening and selecting Marines for MSG duty, and are described in Parker,<br />
Wiskoff, McDaniel, Zimmerman, and Sherman (1989) and in Wiskoff, Parker, Zimmerman,<br />
and Sherman (1989).<br />
Obiective<br />
The primary objective of this work was to develop a system for the continuing evaluation<br />
(CVAL) of MSG performance and behavioral reliability. As has been pointed out<br />
(DuBois, Bosshardt, and Crawford, 1990), recent espionage cases suggest that individuals<br />
become spies as a result of personal and situational factors that occur after they receive<br />
personnel security clearances and are performing in sensitive or high security risk<br />
jobs. The importance of having a continuing assessment program for MSGs, in addition<br />
to the very careful selection procedures, was highlighted in December 1986, when Sgt.<br />
Lonetree admitted to providing information to the Soviet Union while serving in<br />
Moscow as an MSG.<br />
522
The goal for the CVAL system was to reduce the risk of personnel security incidents and<br />
improve the ability of Detachment Commanders to anticipate personnel problems before<br />
they became major disruptions. Thus, there was some emphasis on being able to use<br />
CVAL as a kind of warning system, one which would indicate when there was a need to<br />
intervene, either with informal counseling or disciplinary action short of judicial punishment.<br />
In this context, then, there were several ancillary objectives for the development of<br />
CVAL: (1) to provide an early warning indicator, with suggestions for intervention; (2)<br />
to provide a leadership, counseling, and training tool for Detachment Commanders; and<br />
(3) to minimize personnel turbulence and facilitate/document personnel decisions made<br />
concerning the reliability of MSGs.<br />
Method of Development<br />
General Orientation. It was felt&om the outset that some kind of behavioral checklist<br />
would be an appropriate format for the cornerstone of CVAL. In a recent review of<br />
personnel reliability programs (Bosshardt, DuBois, and Crawford, 1990), the need was<br />
pointed out for more careful definition of the factors that may indicate an individual has<br />
become a security risk. In the current project, we wanted to produce a checklist of<br />
observable behaviors that could indicate when an MSG’s performance was beginning to<br />
exhibit signs of unreliability. This checklist could then be completed by the Detachment<br />
Commander on a regular basis for each MSG, and appropriate action taken.<br />
Sources of Information. The primary source of information for the development of a<br />
CVAL checklist was the huge collection of written examples of MSG<br />
performance/behavior generated in a prior phase of this research effort. These performance<br />
examples were used in the prior research to develop behaviorally-anchored rating<br />
scales that could serve as criteria for validity investigations of the screening procedures<br />
(Houston, 1989).<br />
To obtain the performance examples, workshops were conducted with MSGs, Detachment<br />
Commanders, and the Instructors/Advisors at MSG School, all of whom had prior<br />
experience as MSGs and/or Detachment Commanders. Participants in the workshops<br />
were asked to write (in a structured format) examples of MSG behaviors that were indicative<br />
of extremely effective, average, and extremely ineffective performance. This technique<br />
yielded over 300 examples of behavior that realistically portrayed both highly<br />
effective and highly ineffective MSG performance. The examples were then sorted into<br />
categories that represented important dimensions of the MSG job. The set of dimensions,<br />
and the list of behaviors in each dimension, was the starting point in the development<br />
of a CVAL measure.<br />
Other sources of information included: evaluation forms that had been developed for use<br />
at MSG School, e.g., Peer Evaluation Forms and Screening Board Evaluation Forms;<br />
checklists developed for use as indicators of chemical dependency and emotional instability;<br />
and reports of existing personnel reliability programs, e.g., the Air Force’s Nuclear<br />
Weapons Personnel Reliability Program (PRP), the Department of Energy’s Human<br />
Reliability Program (HRP), and the Navy’s Security Access Eligibility Report (SAER).<br />
Another helpful source of information was the record of MSG Non Judicial Punishments<br />
and Reliefs For Cause kept at MSG Battalion Headquarters. A content analysis of these<br />
records was performed, to determine what types of behavior problems seemed to be the<br />
523
most common. Finally, MSG Battalion personnel were extensively interviewed, to solicit<br />
their ideas on what behaviors represented potential reliability problems.<br />
Preparation of Behavior Indicators Checklist. All of the information described above<br />
was converted to a list of discrete behaviors that could indicate a potential personnel<br />
security risk. These behaviors were sorted, where possible, into the categories used for<br />
the MSG performance rating scales developed in the prior phase of this research. New<br />
categories were formed where the pre-existing system did not seem to cover clusters of<br />
behaviors, and a number of the pm-existing categories were combined and/or renamed,<br />
as appropriate.<br />
The first draft of the Behavior Indicators Checklist contained 61 behaviors, grouped into<br />
10 clusters or behavior categories. Each of these behaviors was considered to be an<br />
indication that an MSG might be headed for, if not already in, some kind of trouble,<br />
ranging from emotional instability to drinking problems, or simply not realizing the<br />
dangers of becoming too friendly with Foreign Service Nationals about whom little was<br />
known.<br />
Examples of checklist behaviors are: “MSG often becomes disorderly or violent when<br />
drinking” and “A Foreign Service National shows a sudden increase of favors towards<br />
this MSG.” There were a number of behaviors that, while not particularly desirable, may<br />
not indicate a real problem if the behavior is relatively short in duration, for example,<br />
“MSG frequently asks to get off duty early or switch duty assignments.” There might be<br />
an acceptable reason for the latter example, e.g., visiting relatives or a special, detachment-related<br />
project. The important point here is that the Detachment Commander<br />
should be aware of the reason for these behaviors, and, if appropriate, take action to<br />
decrease undesirable or dangerous behaviors.<br />
Field Review: An Iterative Process. There were two rounds of field review of the<br />
Behavior Indicators Checklist. In both cases, the checklist was taken out to MSG detachments<br />
and feedback was obtained in small group (or one-on-one) structured interviews<br />
with incumbent MSGs, Detachment Commanders, and a number of the Department<br />
of State officials who work with MSGs in the detachments. Sites were selected<br />
with the following criteria in mind: (1) detachments with Commanders who had a fair<br />
amount of experience in the MSG program; (2) as much geographical dispersion as<br />
possible, within the constraints of our budget; (3) sites that varied in terms of their perceived<br />
desirability (a function of potential threat and of general desirability and hospitality<br />
of the location); (4) detachments that varied in terms of their size, i.e., number of<br />
MSGs; and (5) at least some detachments where there was an obviously high threat of<br />
counter intelligence activity (e.g., Eastern Bloc countries).<br />
The first round of site visits included Vienna, Prague, Belgrade, and Athens. In the<br />
interviews at each detachment, the draft checklist was discussed, item by item, to address<br />
the following issues:<br />
(1) the appropriateness and clarity of the wording;<br />
(2) the extent to which each behavior did, in fact, indicate a potential personnel problem;<br />
(3) the comprehensiveness of the list of behaviors, i.e., whether there were any behavior<br />
indicators that we had overlooked; and<br />
(4) the response format that should be used for the checklist.<br />
524
Based on the feedback received from the first round of site visits, a specific response<br />
format was selected, and a number of revisions were made to the checklist, including<br />
specific wording changes to increase clarity or applicability, the addition of several<br />
behaviors and the deletion of a few, and the combining of two categories that were seen<br />
as overlapping. This draft was reviewed by MSG Battalion personnel, including the<br />
MSG School Instructors/Advisors, and the second round of site visits was scheduled.<br />
There were six detachments visited in the second field review, one in the Middle East,<br />
four in SubSaharan Africa, and one in Western Europe. The same format was followed<br />
for these site visits in terms of the individuals interviewed and the topics covered. There<br />
were several more suggestions for additions and deletions, and a number of further<br />
wording changes recommended. The checklist was again revised, based on these<br />
recommendations, and was again reviewed by MSG Battalion personnel. The final set of<br />
categories were entitled:<br />
A. Job Performance E. Social Behavior<br />
B. Liberty Behavior z F. Emotional Behavior<br />
C. Drinking Behavior G. Money-Related Behavior<br />
D. Personal Relations/<strong>Association</strong>s H. Physical Health and Appearance<br />
Each category had a list of relevant behaviors, an Overall Rating Scale, and a space to<br />
write comments related to that category of behavior. There were four response options<br />
for each behavior: “Definitely Yes”, “Yes Somewhat”, “Definitely No”, and “Not Relevant”.<br />
Every “Yes” response required a written comment in the space provided for that<br />
category. The Overall Rating Scale for each category was a seven-point scale, where the<br />
lowest rating indicated that there were “Definite Problems”, and the highest rating indicated<br />
that the MSG’s “Behavior [was] Always Exemplary”.<br />
Since a number of the behaviors in the checklist were most appropriate or most critical<br />
for countries with a high threat of counter intelligence activity (e.g., Eastern Bloc countries),<br />
these behaviors were identified as such. Examples are: behaviors related to “fraternization”<br />
and behaviors related to using the “buddy system” whenever leaving the<br />
Embassy compound.<br />
Trial Usage and Evaluation. As an additional check on the readiness of the Behavior<br />
Indicators Checklist, Detachment Commanders were asked to use it for several months<br />
on a “For Research Only” basis. Commanders were briefed on the purpose of the checklist<br />
and were instructed to fill one out for each MSG in their detachment, after the MSG<br />
had been with the detachment for 90 days. They were further instructed to mail completed<br />
checklists directly to the researchers.<br />
A total of 792 completed checklists were received. These were reviewed for response<br />
errors, e.g., checking two response options for one behavior; and for illogical patterns of<br />
responding, e.g., many negative behaviors checked in a category, with a very high overall<br />
rating for that category. Additionally, all written comments were reviewed. Based on<br />
these investigations, there did not appear to be any problems with the response format or<br />
with the overall clarity and understandability of the checklist.<br />
There was also an attempt made to gather some other criterion data for the MSGs for<br />
whom we had completed checklists, to see if the patterns of response on the checklist<br />
made at least intuitive sense when compared to another performance/reliability measure.<br />
525
_II_ -..-.- -_- -..__.. -. ..~ ..---- _ . ..-- --. .~<br />
The most logical criterion data for this purpose were the available records of Non Judicial<br />
Punishments (NJP) and Reliefs for Cause (RFC). The base rates for these criteria,<br />
however, are so low that there were very few cases (N=40) where we had both a completed<br />
checklist and a record of either NJP or RFC. All 40 “matches” were with NJPs;<br />
there were no matches with RFC. In over half of these 40 cases, the NJPpredafed the<br />
completion of the checklist, so they were of no use in determining if the checklist could<br />
predict personnel problems. Checklists for the remaining few matches were examined<br />
and the response patterns did indeed seem to indicate that behavior problems were detected<br />
prior to the incident that incurred the Non Judicial Punishment.<br />
Near the end of the “For Research Only” usage period, a questionnaire was sent to all<br />
140 detachments, asking for an evaluation of the checklist and of the User’s Guide that<br />
accompanied it. Subjects covered by this questionnaire included:<br />
(1) Clarity of User’s Guide and checklist content;<br />
(2) Clarity of format (“user friendliness”);<br />
(3) Ease/Difficulty of making accurate ratings;<br />
(4) Time to complete the checklist/extent of administrative burden;<br />
(5) Completeness of checklist;<br />
(6) Usefulness of checklist; and<br />
(7) Recommendation for continued use.<br />
There were 106 questionnaires returned; a 76% return rate. The results can be summarized<br />
as follows. Both the User’s Guide and the checklist itself were reported to be clear,<br />
understandable, and “user friendly”. It was “fairly” to “very” easy to make accurate<br />
ratings. It took an average of 28 minutes to complete the checklist, and was considered<br />
to be a “reasonable” to “minimal” administrative burden (versus “excessive”). The list of<br />
behavior indicators on the checklist was considered to be “very complete“, and it was<br />
reported to be “pretty useful” (this was the second to highest usefulness response option;<br />
the highest was “extremely useful”). Recommendations regarding continued use of the<br />
checklist were:<br />
Yes, as it stands 76<br />
Yes, with revisions 20<br />
No 7<br />
No response 3<br />
106<br />
Of the twenty Detachment Commanders that indicated “Yes, with revisions”, most did<br />
not make specific recommendations for revision. Those who did comment on this<br />
recommendation referred to more procedural revisions, rather than revisions to the<br />
checklist items (e.g., use the checklist as a formal Counseling Sheet).<br />
Final Implementation of Checklist<br />
The final draft of the CVAL Behavior Indicators Checklist is now ready for implementation.<br />
An outline of the guidelines recommended for its use follows:<br />
526
(1) The keynote in interpreting the CVAL checklist is to look for behavioral change over<br />
time; to look for patterns that are out of character for that individual. For example, if<br />
a Marine is typically ,fairly quiet, then it should be of little concern that he doesn’t<br />
engage in a lot of casual conversation with his fellow Marines. If, on the other hand,<br />
a Marine is usually very outgoing and talkative, and he suddenly “goes quiet”, there<br />
may be a problem.<br />
(2) Not all behaviors on the checklist are particularly damning in and of themselves.<br />
Although there are no items on the checklist that represent perfectly healthy behavior<br />
for someone in the MSG position, there may be a reasonable explanation for an MSG<br />
exhibiting a particular behavior. Virtually every behavior on the checklist, however,<br />
should motivate the Detachment Commander to ask “Why?“. If there is no apparent<br />
reason for the behavior, attempts should be made to find out what the trouble is, for<br />
example, by observing the MSG more closely, or talking with him about the behavior.<br />
(3 ) The severity of some of the checklist behaviors depends significantly upon detachment<br />
location, For example, there are obvious differences in the implications some<br />
behaviors have for Eastern Bloc countries versus other countries.<br />
References<br />
Bosshardt, M. J., DuBois, D. A., & Crawford, K. (1990). Continuing assessment of<br />
cleared personnel in the militarv services: findings and recommendations (Institute<br />
Report No. 193). Minneapolis, MN: Personnel Decisions Research Institutes.<br />
DuBois, D. A., Bosshardt, M. J., & Crawford, K. (1990). Continuing assessment of<br />
cleared personnel in the military services: a conceptual analvsis and literature<br />
review (Institute Report No. 190). Minneapolis, MN: Personnel Decisions Research<br />
Institutes.<br />
Houston, J. S. (1989). Development of measures of Marine Securitv Guard performance<br />
and behavioral reliabilitv (Institute Report No. 171). Minneapolis, MN: Personnel<br />
Decisions Research Institutes.<br />
.<br />
Houston J. S., Wiskoff, M. F., 8z Sherman, F. (In press). A measure of behavioral reliability<br />
for Marine Security Guards: A final report (PERSEREC-SR-90-m).<br />
Monterey, CA: Defense Personnel Security Research and Education Center.<br />
Parker, J. P., Wiskoff, M. F., McDaniel, M. A., Zimmerman, R. A., & Sherman, F.<br />
(1989). Development of the Marine Security Guard Life Experiences Questionnaire<br />
(PERSEREC-SR-89408). Monterey, CA: Defense Personnel Security Research<br />
and Education Center.<br />
Wiskoff, M. F., Parker, J. P., Zimmerman, R. A., 8z Sherman, F. (1989). Predicting<br />
school and job performance of Marine Security Guards (PERSEREC-SR-89-013).<br />
Monterey, CA: Defense Personnel Security Research and Education Center.<br />
527
.<br />
SYMPOSIUM: JOB PERFORMANCE TESTING FOR ENLISTED PERSONNEL<br />
J. H. Harris (Chair), Charlotte H. Campbell,<br />
and Roy C. Campbell<br />
NO ABSTRACT RECEIVED<br />
528
NAVY ': HANDS-ON AND KNOWLEDGE TESTS FOR THE NAVY RADIOMAN<br />
Earl L. Doyle and Roy C. Campbell<br />
Human Resources Research Organization<br />
Introduction<br />
The Navy approach to the Job Performance Measurement Project focussed on<br />
the development of benchmark hands-on job proficiency tests which would, in<br />
turn, guide the development of written task-specific tests and written general<br />
knowledge tests that could be used as substitute measures of job performance,<br />
One of the jobs selected for this effort was the entry level Radioman (RM).<br />
These individuals qualify for their rating by graduating from the Navy Class A<br />
Radioman school at San Diego, California. After qualification they typically<br />
serve in one of two types of&facilities--either a shore-based installation or<br />
on board ship,<br />
This paper will review the major steps in development, the highlights of<br />
field test administration, and the principal findings of this research.<br />
Hands-On Tests<br />
Test Development<br />
Tasks to be tested were selected by a panel of experts consisting<br />
primarily of Senior Radiomen from the Navy Class A Radioman School (Lammlein,<br />
1987). Twenty-two tasks were initially identified. Project test developers,<br />
working with Radioman School instructional staff, integrated those tasks that<br />
are normally closely associated when performed on the job. This resulted in<br />
the development of the 14 tests shown in Table 1.<br />
Table 1<br />
Radioman Tasks for Hands-On Tests<br />
*Act as a Broadcast Operator<br />
*File Messages<br />
Change Paper/Ribbons on Teletype<br />
Establish System - November<br />
Perform Maintenance on Receiver<br />
*Prepare Message - DO173<br />
*Type/Format/Edit Message<br />
*Indicates product scored test.<br />
*Log Incoming Messages<br />
*Manually Route Messages<br />
Establish System - Golf<br />
*Inventory Classified Documents<br />
Perform Maintenance on Transmitter<br />
*Verify Outgoing Message<br />
*Prioritize Outgoing Messages by<br />
Precedence and Time<br />
The developed tests were based on an analysis of the individual and<br />
component tasks and consisted of dichotomously scored (GO/NO-GO) performance<br />
measures corresponding to steps done or characteristics of products produced.<br />
529
The performance tests utilized product scoring wherever a product was produced<br />
as a complete or partial result of the performance. Where feasible, product<br />
scoring is desirable because, correctly administered, it can enhance<br />
reliability. The nine tests that utilized at least partial product scoring<br />
are identified as such in Table 1.<br />
In addition to the scoresheets, developers prepared equipment setup<br />
instructions, instructions to the examinees, and scoring instructions. The<br />
entire test was designed to be administered at a single station using either<br />
an actual or a simulated ship's radio shack. Although the 14 tests were<br />
independent, they were operationally interconnected so they fit logically and<br />
sequentially into the test situation and location.<br />
Written Tests<br />
Written tests were developed that corresponded to the 22 tasks covered<br />
in the hands-on tests. Three features characterized these tests:<br />
. The tests were performance or performance-based. Items were<br />
based on either performing the same steps required in the<br />
hands-on test or in answering a question of how a step is done.<br />
� The tests were founded on performance errors. To insure items<br />
were performance oriented, the causes of error in performance<br />
were identified. Error was identified as having four origins:<br />
The Radioman did not know where to perform (location), did not<br />
know when to perform (sequence), did not know what the product<br />
of correct performance was (recognition), or did not know how<br />
to perform (technique).<br />
� The tests provided likely behavioral alternatives. Incorrect<br />
alternatives were based on likely errors that were possible and<br />
do occur on the job. Incorrect alternatives also had to be<br />
wrong, not merely less desirable than the correct alternative.<br />
The development result was an 87 item test in a multiple choice format<br />
that was organized into 11 topical, functional task areas that generally<br />
corresponded to the 14 hands-on test areas. (Several of the hands-on test<br />
areas that needed to be treated separately for administrative and equipment<br />
set-up requirements were combined for the written test, and one written test<br />
task area did not survive validation.) These 11 written test areas were<br />
organized so they could be administered and analyzed independently.<br />
General Knowledae Test<br />
The third area of RM testing was a written general knowledge test. Like<br />
the written performance test, this was a multiple choice test and was based on<br />
the same tasks that generated the hands-on tests. The difference between the<br />
two written tests was that the written performance test was specifically<br />
designed to measure performance while the general knowledge test measured the<br />
application of knowledge to the task subject--which may not necessarily<br />
reflect performance. For example, the written performance test might describe<br />
a situation and ask what EMCOM condition should be imposed under those<br />
circumstances; the general knowledge test might ask what EMCOM is.<br />
530
The general knowledge test consisted of 98 items. It was not separated<br />
by task or functional area, and in administration and analysis was treated as<br />
a single test.<br />
Test Administration<br />
The field tests were administered to 61 Radiomen, all of whom were<br />
graduates of the Class A Radioman School, were in paygrades E-2, E-3, and E-4,<br />
and had graduated from the School between 1 and 59 months prior to testing.<br />
(Of the tested population, 79% were in paygrade E-3 and 60% were in the 12<br />
months to 35 months experience window.) Twenty-eight of the 61 sailors tested<br />
were assigned to shore installations at the time of testing and 33 were aboard<br />
ships.<br />
<strong>Testing</strong> was conducted at two locations and about a month apart. <strong>Testing</strong><br />
lasted for 8 hours for each 'examinee and the three components of the test were<br />
sequentially counterbalanced. Five hands-on scorers were used. All scorers<br />
were project staff and had received extensive task/test training and<br />
calibration. Each Radioman was scored independently by at least two scorers<br />
for each hands-on test.<br />
Field Test Results<br />
Although a wide variety of analyses were conducted (Ford, Doyle,<br />
Schultz, & Hoffman, 1987), this paper will focus on four main areas of<br />
interest. Specifically:<br />
� Interrater reliability of the hands-on tests.<br />
� Internal consistency within test methods.<br />
� Intercorrelations among test methods.<br />
� Assignment effect (ship vs. shore).<br />
Interrater Reliabilitvsf the Hands-On Tests<br />
Interrater reliability estimates were computed from a generalizability<br />
theory in which absolute generalizability coefficients were produced (SAS,<br />
1982; Brennan, Jarjoura & Deaton, 1980). Generalizability estimates were<br />
obtained as if only one rater score were produced and for an average of the<br />
two raters, as shown in Table 2.<br />
The reliabilities are exceptionally high. This is attributed to the<br />
influence of the firm control over the scorers that was possible because they<br />
were members of project staff and, secondly, due to the high incidence of<br />
product scoring among the tested tasks.<br />
Internal Consistencv Within Test Methods<br />
Intertask correlations were conducted for the hands-on, written, and<br />
general knowledge tests (the general knowledge test was analyzed for interitem<br />
correlations since it was treated as a single test) and are presented in<br />
Table 3. The obtained coefficients demonstrate acceptable levels of internal<br />
consistency.<br />
531
_--. _-...- __--- . . . . .__.. -. - _- __<br />
Table 2<br />
Generalizability Coefficients for Hands-On Tests<br />
Task One Rater Two Raters<br />
*Broadcast Operator 0.96 0.98<br />
*Log Messages 0.96 0.98<br />
*File Messages 0.95 0.98<br />
*Manually Route Messages 0.95 0.98<br />
Change Paper/Ribbons 0.60 0.75<br />
Establish System - Golf 0.91 0.95<br />
Establish System - November 0.94 0.97<br />
*Inventory Classified Documents 0.69 0.82<br />
Preventive Maintenance - Receiver 0.93 0.96<br />
Preventive Maintenance - Transmitter 0.95 0.98<br />
*Prepare Message DD173 0.98 0.99<br />
*Prioritize Outgoing Messages 0.90 0.95<br />
*Type/Format/Edit 0.96 0.98<br />
*Verify Outgoing Messages 0.97 0.98<br />
*Indicates primarily product scored tests.<br />
Table 3<br />
Intertask/Item Correlations by Test Component<br />
Component Correlation<br />
Hands-On 0.89<br />
Written Performance 0.74<br />
General Knowledge 0.71<br />
Intercorrelations Amona Test Methods<br />
Correlations, particularly between hands-on and written performance<br />
tests, are important because of the possibility of substituting written tests<br />
for resource-demanding hands-on tests. The correlations between written and<br />
hands-on task tests are shown in Table 4 and the overall correlations between<br />
test methods is shown in Table 5.<br />
532
I<br />
Table 4<br />
Correlations Between Written Tests and Hands-On Tests<br />
Written Tests<br />
Broadcast Operator<br />
Maintain Comm Center File<br />
Manually Route Messages<br />
Establish Systems - Golf/November<br />
Inventory Classified Documents<br />
Preventive Maintenance - Receiver<br />
Preventive Maintenance - Transmitter<br />
Verify Outgoing Message ,<br />
Prioritize Messages<br />
Type/Format/Edit<br />
Prepare Message DD173<br />
Correlation<br />
.228<br />
.370** &<br />
.282* �<br />
422**<br />
.596** &<br />
.lOl - 557**<br />
.523**<br />
.234*<br />
.299*<br />
.375**<br />
.093<br />
.447**<br />
Note. Double correlation figures indicate a single written test covered two<br />
hands-on tests.<br />
*Significance: pc.05. **Significance: PC.01<br />
Table 5<br />
Correlations Among Scores by Test Method<br />
Test Method Hands-On Written Performance General Knowledge<br />
Hands-On<br />
Written Performance<br />
General Knowledge<br />
*Significance: px.01<br />
.71* .61*<br />
.71* .68"<br />
.61* .68*<br />
These correlations are very high. In a previous study (Rumsey, Osborn,<br />
&,Ford 1985) the authors looked at correlations between hands-on and written<br />
tests for 28 occupations in which the overall correlation for the 28 jobs is<br />
.41. Selecting the eight occupations that are similar to the Radioman, the<br />
hands-on--written correlation is .45, and for the military job most like the<br />
Radioman's --the Army Radio-Teletype Operator--the correlation is .37. Again,<br />
much of the notable results for the Radioman in this area is believed to be<br />
directly a result of the high rater reliability performance.<br />
533
I<br />
Assianment Effect (Ship vs. Shore)<br />
A comparison of the performance of Radiomen on all test methods revealed<br />
marked differences depending on whethe** the sailors were shore-base;o;rt;;ipbased,<br />
with the ship-based examinees cl)nsistently scoring higher.<br />
hands-on tests, this difference was significant (at p
1<br />
Interrater Reliability as an Indicator of<br />
HOPT Quality Control EffeCtiVeAeSs<br />
Major P. J Exner, USMC<br />
HQ USMC<br />
Jennifer L. Crafts<br />
Daniel B. Felker<br />
Edmund C. Bowler<br />
American Institutes for Research<br />
Paul W. Mayberry<br />
Center for Naval Analyses<br />
The United States Marine Corps Job Performance Measurement<br />
Project is attempting to validate enlistment quality requirements<br />
against actual on-the-job requirements. Since there are nearly<br />
500 <strong>Military</strong> Occupation Specialties (MOSS), developing hands-on<br />
performance tests (HOPTS) for each MOS is impractical. Therefore<br />
the Marine Corps has elected to test relatively large numbers of<br />
Marines in a few critical MOSS in each of the four Armed Services<br />
Vocational Aptitude Battery composites used for classification.<br />
<strong>Testing</strong> began with the General Technical (GT) composite in<br />
1986-87 for the infantry occupational field. In 1989-90 tests for<br />
Mechanical Maintenance (MM) composite MOSS were developed and<br />
administered. In August, 1990, hands-on testing was completed on<br />
approximately 1900 Marine automotive and helicopter mechanics.<br />
Because of the many possible sources of error in the<br />
development and administration of HOPTS, quality control is<br />
critical at every step. Poor test design or execution can<br />
significantly reduce validities and diminish the value of the<br />
results. In a preliminary Marine Corps study Maier (1988)<br />
reported a large reduction in validities due to various errors.<br />
Such errors can include content, test design, test administrator<br />
(TA) training, environmental, temporal, and other effects. One<br />
indicator of possible problems is interrater reliability,.. or TA<br />
agreement.<br />
In this paper, we will'review the quality control<br />
measures used in MM testing and examine preliminary reliability<br />
results across task, test site, MOS, and time.<br />
A series of quality control measures were used to ensure the<br />
quality of hands-on performance data. They include: recruitment<br />
of former or retired Marines to serve as TAs; selection of TA<br />
applicants based on scores on structured interviews; standardized<br />
test site setup;- extensive and ongoing training of TAs; rotation<br />
This research was funded by Contract No. N00014-87-C-0001 and by<br />
subcontract CNA 4-89. ~11 statements expressed in this paper are<br />
those of the authors and do not necessarily reflect the official<br />
views or policies of the Department of the Navy or the U.S. Marine<br />
Corps.<br />
535
-<br />
t<br />
of TAs across tasks: shadow scoring: on-site data entry; and<br />
ongoing counselling of TAs.<br />
Recruiting of Former/Retired Marines as TAs<br />
We sought former or retired Marines to serve 'as TAs,<br />
preferably those with experience in the MOSS which were tested.<br />
This offered several advantages over using civilians or active<br />
duty Marines. Their Marine Corps background enabled them to<br />
relate better to the examinees and promoted a more realistic<br />
testing atmosphere. Also, using former rather than active duty<br />
Marines eliminated a possible bias of Staff Non-Commissioned<br />
Officers toward their troops. Former Marines would have no<br />
vested interest in seeing that "their" mechanics performed well.<br />
Selection of TAs Based on Structured Interviews<br />
All TA candidates were screened using a structured interview<br />
which evaluated their suitability in several categories.<br />
Applicants were questioned concerning their previous experience<br />
in six areas: performance of mechanical tasks; test<br />
administration; administrative duties: planning and organization:<br />
public speaking; and vehicle maintenance. For each dimension,<br />
applicants were evaluated using a three point scale indicating<br />
noI moderate, or high familiarity. There were more applicants<br />
at the East Coast test sites, but overall TA quality was high at<br />
all locations. West Coast TAs for helicopter testing tended to<br />
be less experienced former Marines than at all other locations.<br />
Standardized Test Site Set Un<br />
<strong>Testing</strong> was conducted at five test sites. There was one site<br />
for automotive testing on each coast, and a single test site for<br />
helicopter testing on the East Coast. Due to the wide separation<br />
of helicopter assets on the West Coast, it was necessary to set<br />
up two test sites there. To reduce site differences, the same<br />
people were involved in establishing the site requirements and<br />
setup procedures at all test sites for air or ground. Where more<br />
than one test site was set up simultaneously, individuals<br />
directing the set up had previous experience at another test<br />
site. Site directors at all sites were involved in the site<br />
requirements determination from beginning to end. Standardized<br />
aircraft/vehicle, test equipment, parts, tools, publications, and<br />
other requirements lists were prepared for all sites. Local<br />
variations in equipment brands, procedures, and facilities were<br />
carefully analyzed for their possible impact and eliminated or<br />
minimized across all sites.<br />
Extensive and Onqoing TA Traininq<br />
TAs underwent a thorough week-long training program. Most<br />
had served in the Marine Corps where training had been an<br />
integral part of their responsibilities for years. We stressed<br />
the requirement to avoid giving feedback to the examinee which<br />
536
might influence task performance. TAs were trained on how to<br />
perform each task they were to evaluate and practiced them under<br />
the supervision of active duty subject matter experts. This<br />
included role playing and deliberate errors on the part of the<br />
V'examineelt to check TA consistency and develop standardized<br />
scoring of irregular responses. Once test administration was<br />
begun, there were periodic review of steps with low interrater<br />
reliabilities, with retraining where necessary.<br />
Rotation of TAs Across Tasks<br />
TAs were trained in multiple tasks to allow them to rotate<br />
among test stations. This lessened the effect of boredom,<br />
provided a cross check on the standardization of scoring in each<br />
task, and reduced the impact of TA differences on scoring.<br />
Shadow Scorinq<br />
Perhaps the most important quality control procedure, shadow<br />
scoring involved independent evaluation of an individual's task<br />
performance by two TAs simultaneously. Shadow scorers were used<br />
to monitor TA performance and test reliability, and were<br />
systematically scheduled to capture interactions among testing<br />
order and individual TA characteristics.<br />
On-Site Data Entrv Trend Analvsis<br />
A Hands-On Score Entry System (HOSES) was developed to enter,<br />
verify, and report analyses-of collected data. Daily on-site<br />
data entry enhanced completeness of data and allowed for early<br />
identification of problems with the tests, TA consistency, and<br />
score drift over time. HOSES generated three reports which were<br />
used by site hands-on managers to improve scoring reliability.<br />
1. Data Entry Report. All data were entered twice. This report<br />
verified that there were no discrepancies between the two<br />
entries. It also reported any missing data so the information<br />
could be tracked down on the day of original testing. This<br />
greatly reduced the amount of missing data.<br />
2. The Detailed Discrepancy Report listed al'1 steps where<br />
primary and shadow scorers disagreed. It also gave percent<br />
disagreement for each task, and overall daily total by TA.<br />
3. The Summary Report presented cumulative historical summaries<br />
by TA and task. TA summaries showed leniency and reliability<br />
information for each task administered by the TA. Leniency was<br />
measured as a deviation from the mean percentage of "GOATS for<br />
all TAs on each task. Reliability indicated disagreement with<br />
all other TA's on each task. These were valuable in identifying<br />
individual TA problems. Since this report could be broken out by<br />
time, it also provided trend information. Task summaries showed<br />
percent "Go" and disagreement for each step. This helped focus<br />
on test effect problems, i.e. those common across all TAs.<br />
537
Hands-on managers used these reports extensively. Differences<br />
among TAs were discussed, and ambiguities in interpretation of<br />
scoring rules were resolved through discussion and, if required,<br />
additional training. Individual and group trends could be<br />
detected. Individual TA counselling focused on adherence to the<br />
original training standards and the definition and interpretation<br />
of scoreable steps. Hands-on managers avoided overemphasis<br />
on consistency to prevent artificially high levels of agreement.<br />
Interrater Reliability Results<br />
Interrater reliability, or TA agreement, can indicate the<br />
presence of several possible error sources: test design, time,<br />
t environmental, or other effects. Interrater reliability is the<br />
percentage agreement between primary and shadow scorers on<br />
individual task steps, It is computed by dividing the number of<br />
steps on which the primary and shadow scorer agreed by the total<br />
number they both graded, summed across all examinees and all<br />
tasks. It was calculated using all obsenrations where both<br />
primary and shadow step scores were available.<br />
Fig. 1: Agreement by task<br />
Age-t bctrrem ,nimwy md rhodow test odminirtrotor,<br />
b y time intcrvd<br />
Fig. 2: Agreement by Time<br />
Period<br />
Figure 1 shows scorer agreement across tasks for automotive<br />
mechanics. Agreement ranged from .873 to .971 indicating that<br />
TAs could reliably differentiate tlG~t' and "No Go" performance.<br />
The lowest reliabilities at both sites were on troubleshooting<br />
tasks, indicating some ambiguity in scoring the steps on those<br />
tasks. Three of the lowest four reliabilities occurred on tasks<br />
which were hard to observe because of confined spaces. The fact<br />
that the relative reliabilities among tasks were the same between<br />
sites also indicates a good training program and suggests that<br />
reliability differences were due to test effects.<br />
538
Figure 2 shows temporal effects at auto mechanic test sites.<br />
Site A experienced a slight drop in agreement in the middle time<br />
period. The decline in reliability was noted at the time, so<br />
counselling and retraining were conducted, resulting in an<br />
increase during the final period. Site B agreement increased<br />
during all three periods. The overall increase in agreement at<br />
both sites is natural given increasing familiarity of the TAs<br />
with the scoring standards over time. The fact that the increase<br />
is relatively small indicates that the initial training program<br />
prepared the TAS very well. Again, this points to test effects<br />
as a likely cause of the differences in reliabilities.<br />
This same trend carries over into the helicopter mechanic<br />
t reliabilities despite some differences in their HOPTs. Whereas<br />
all automotive mechanics were given the same HOPT, each<br />
helicopter mechanic MOS had its own test. Helicopter MOSS were<br />
tested sequentially, so temporal effects are evident across<br />
aircraft type, as shown in Figure 3. Since test order varied<br />
across site, increased reliability is not indicated left to right<br />
for both sites. Test order for Site C was CH-53A/D, CH-46,<br />
CH-53E, and UH/AH-1. Site D order was UH/AH-1, CH-46, and CH-<br />
53E. No CH-53A/D were tested at Site D. Taking this test order<br />
into account we see that agreement increased over time at both<br />
sites, except for the CH-46 at Site C.<br />
Agasmcnt between +-7wy md shodor ted odnhistrotorr b y oicroft<br />
Fig. 3: Agreement by Aircraft Fig. 4: Agreement by Interview<br />
Ratings<br />
The drop in CH-46 agreement is explainable in terms of<br />
variation in test conditions. At Site D, all examinees in 2<br />
particular MOS were tested on the same aircraft. At Site C, each<br />
unit set up its own aircraft, resulting in changing conditi
c<br />
mechanic among the TAs (the rest were from the other three<br />
aircraft). The reduced agreement may partly reflect this<br />
diminished commonality of experience among the TAs. Even so,<br />
over time, the continuing training program and skill transfer<br />
across aircraft resulted in overall increased reliability.<br />
Figure 4 plots agreement versus the initial TA interview<br />
ratings. The strongest correlation with agreement was for TA<br />
applicants who rated high in test administration, public<br />
speaking, and administrative experience. The negative effect of<br />
maintenance and mechanical familiarity may indicate a bias<br />
resulting from experience. Yet in all cases, reliabilities were<br />
acceptable. Interestingly, among all MM TAs, there was no<br />
significant difference in between TAs who had also senred on the<br />
infantry project several years earlier and the mechanics hired<br />
for this project.<br />
Conclusion<br />
The high reliabilities found in the preliminary analysis are<br />
encouraging. They indicate that the TA training program was<br />
sound, scoring was well standardized across sites, and that the<br />
HOPT steps were discrete, and consistently measurable. There<br />
were also no indicators. of any significant test effects or other<br />
systematic problems with the test that would preclude achieving<br />
the high validities obtained in the infantry study. Finally, the<br />
results have implications for HOPT Test Administrator selection.<br />
This analysis seems to indicate that such qualities as previous<br />
test administration experience and public speaking are more<br />
important than experience in the particular field being tested.<br />
Reference<br />
Maier, M. ,H. (1988). On the Need for Quality Control in<br />
Validation Research. Personnel Psvcholosv, 41, 497-502.<br />
540
ARMY: JOB PERFORMANCE MEASURES FOR NON-COMMISSIONED OFFICERS<br />
Charlotte H. Campbell and Roy C. Campbell<br />
Human Resources Research Organization<br />
The Army approach to criterion measurement for the JPM project focuses<br />
on two stages in the enlisted person's service time: after about two years in<br />
service, and after three to five years, as a non-commissioned officer (NCO,<br />
corporal E4 or sergeant Es). In this presentation, we report on the job<br />
analysis, development of written test, job sample test, and rating scale<br />
instruments, and testing results for NCOs. The analysis and testing were<br />
conducted on nine jobs, or <strong>Military</strong> Occupational Specialties (MOS), listed in<br />
Table 1.<br />
Table 1<br />
Army <strong>Military</strong> Occupational Specialties (MOS)<br />
11B<br />
13B<br />
19E<br />
31c<br />
63B<br />
71L<br />
aaM<br />
91B<br />
95B<br />
Infantryman<br />
Cannon Crewmember<br />
Armor Crewman<br />
Single Channel Radio Operator<br />
Light Wheel Vehicle Mechanic<br />
Administrative Specialist<br />
Motor Transport Operator<br />
Medical NC0<br />
<strong>Military</strong> Specialist<br />
-Job Analysis<br />
For-each M?S,-a job analysis was perforped by aggregat - . .<br />
ing al 1 availab le<br />
information to define a population of tasks.' Squrces ot Job- and _ - _ task- . -<br />
analytic information included Soldier's Manuals (both MOS-specific and Common<br />
Task), Army Occupational Survey Program data on performance frequency, data on<br />
IJob analysis details may be found in J. P. Campbell (Ed.), Improvina the<br />
Selection, Classification, and Utilization of Armv Enlisted Personnel: Annual<br />
Report, 1987 Fiscal Year (HumRRO Report IR-PRD-88-18), October 1987.<br />
This research was funded by the Army Research Institute on two projects: Improvirrq<br />
Classification, and Utilization of Army Enlisted Personnel (Project A) (Project No. MDA903-82-C-0531), and<br />
Building the Career Force (Project No. MDA903-89-C-0202). Project Director is J. H. Harris, and Principai<br />
Scientist is J. C. Campbell, both of Human Resources Research Organization. Contracting Officer's Technica!<br />
Representative is Dr. M. 6. Rumsey, who is the Chief of the Selection and Classification Technical Area of<br />
the Army Research Institute for the Behavioral and Social Sciences. The views expressed herein are those of<br />
the authors and do not necessarily represent the official position of the Army Research Institute or the<br />
Department of the Army.<br />
541
,<br />
frequency and importance of supervisory tasks from a special administration of<br />
the Leader Requirements Survey, collection and content analysis of critical<br />
incidents, and interviews with MOS incumbents.<br />
The resulting job domain included supervisory, common, and MOS-specific<br />
tasks and behaviors. Army policy designates certain tasks as being part of<br />
the job for corporals and sergeants; tasks at lower skill levels were included<br />
in the domain because of the Army's policy that soldiers are responsible for<br />
such tasks, and tasks at higher skill levels were included if there was<br />
evidence that soldiers in fact performed such tasks.<br />
Instrument Development 2<br />
Information collected using the critical incident methodology was used<br />
to construct a series of rating scales for each MOS, as well as scales that<br />
were not specific to any one MOS but rather reflected Army-wide behaviors.<br />
These scales were used to measure behaviors on all three components of the job<br />
domain -- supervisory, common, and MOS-specific -- by means of ratings<br />
collected from soldiers' supervisors. The 7-point rating scales were<br />
behaviorally-anchored, that is, short descriptions of behaviors that<br />
characterize the low, middle, and high points of each of the scales were<br />
provided. Army-wide supervisory behaviors (e.g., Monitoring, Organizing<br />
Missions and Operations) were addressed by 12 of the scales, 9 scales were<br />
Army-wide and non-supervisory (or common, e.g., Following Regulations and<br />
Orders, Physical Fitness), and for each MOS there were between 7 and 14 MOSspecific<br />
dimensions.<br />
For the task-based information, judgments were obtained from subject<br />
matter experts (SMES) on several task parameters, including performance<br />
difficulty, performance variability, and criticality. The task list for each<br />
MOS was clustered into functional areas, and a second panel of SMEs selected<br />
proportional systematic samples from the task population. These task samples<br />
were subjected to formal reviews by the proponent.<br />
At this point, the task-based instrument development process diverged<br />
into four separate approaches: Job knowledge (written) tests, hands-on job<br />
sample tests, role-play simulations, and written situational judgment tests.<br />
Multiple-choice job knowledge test items were constructed for all of the MOSspecific<br />
and common tasks selected for each MOS. These tests are<br />
characterized by their orientation on task performance and by the extensive<br />
use of graphics and job-relevant contextual information. For each MOS, a onehour<br />
test of both common and MOS-specific tasks was prepared, comprising<br />
approximately 120 items. Two scores were constructed, for common tasks and<br />
*Details of instrument development are presented in J. P. Campbell (Ed.),<br />
Buildina the Career Force, First Year Report (in preparation). Rating scales<br />
development and Situational Judgment Test development were directed by W. C.<br />
Borman and M. Hanson of Personnel Designs Research Institute, Inc. Role-play<br />
development was directed by E. D. Pulakos of Human Resources Research<br />
Organization and D. Whetzel of the American Institutes for Research.<br />
Development of hands-on and job knowledge tests was directed by C. H. Campbell<br />
and R. C. Campbell of Human Resources Research Organization, and D. C. Felker<br />
of the American Institutes for Research.<br />
542
for MOS-specific tasks, as the percent of items answered correctly on tasks in<br />
each area.<br />
Hands-on job sample tests were developed to test performance on 8-14 of<br />
the tasks selected for each MOS. The tasks that were allocated to the handson<br />
component included, by design, both common and MOS-specific tasks, at the<br />
target skill level as well as lower and higher skill levels, and from as many<br />
functional areas as was feasible for testing. Scores were constructed as the<br />
percent of steps performed correctly for a given task, averaged across the<br />
common or MOS-specific tasks.<br />
Examination of the supervisory tasks selected for each MOS revealed a<br />
common structure of three areas of supervisory behaviors across the nine MOS:<br />
Personal Counseling, Disciplinary Counseling, and Training. To measure these<br />
three aspects of the job, simulation exercises (role-plays) were developed.<br />
The role of a private was played by a trained civilian test scorer (three<br />
different scorers performed the three roles for a given soldier). At the<br />
conclusion of a role-play, the actor/scorer rated the soldier on 12-18 aspects<br />
of behavior during the exercise. Each aspect was rated by means of 3-point<br />
behaviorally-anchored rating scale, and an overall score was computed as the<br />
average across the three role-plays of the mean rating on items within the<br />
role-play.<br />
The written situational judgment tests were designed to tap those areas<br />
of supervisory behaviors that could not be included in the role-plays. They<br />
were intended to evaluate the effectiveness of the NCO's judgments about what<br />
to do in difficult supervisory situations, and were meant to tap the cognitive<br />
aspects of first-line supervisory practice in the Army. The test contained 35<br />
items, consisting of a situation and 3-5 alternative courses of action;<br />
soldiers indicated which response alternatives they believed to be the most<br />
and the least effective. Effectiveness weights were assigned to each response<br />
of each item with the assistance of the Sergeants-Major Academy, and item<br />
scores were computed as the weight of the soldier's "Most Effective" response<br />
minus the weight of the soldier's "Least Effective" response. The total score<br />
was the mean of the item scores.<br />
Figure 1 portrays the test mode (written, job sample, and ratings) by<br />
job component (supervisory, common task, and MOS-specific) coverage among the<br />
testing instruments.<br />
Test Administration and Results<br />
Data were collected from 1009 soldiers and their supervisors (rating<br />
scales only) in the nine MOS at 13 Army posts CONUS and in Germany. The<br />
hands-on tests were administered by NC0 scorers under the supervision of<br />
trained civilian staff; all other instruments were administered by trained<br />
members of the project staff.<br />
Table 2 gives the basic statistical characteristics for each instrument,<br />
across the nine MOS. For every instrument, the mean scores are above the<br />
midpoint. However, there is no great evidence of skew in the data, and the<br />
reliability estimates are satisfactory.<br />
543
Test Mode Supervisory<br />
WRITTEN TESTS Situational Judgment Test<br />
(Mean of effectiveness<br />
weight for "M' responses<br />
minus effectiveness weight<br />
for "L" responses)<br />
-<br />
Job Components<br />
CommxT MOS-Specific<br />
Job Knowledge Tests Job Knowledge Tests<br />
of Comnon Tasks of MOS-Specific Tasks<br />
(Percent items correct) (Percent items correct)<br />
JOB SAMPLE TESTS Supervisory Role-Plays Hands-On Tests Hands-On Tests<br />
(Mean across role-plays of Cornnon Tasks of MOS-Specific Tasks<br />
of ratings on 3-point (Mean across tasks of (Mean across tasks of<br />
effective behavior scales) percent steps passed) percent steps passed)<br />
RATINGS Rating Scales - Army Wide Rating Scales - Army Wide Rating Scales -<br />
Supervisory Dimensions Non-Supervisory Dimensions MOS-Specific Scales<br />
(Mean across dimensions (Mean across dimensions (Mean across dimensions<br />
of supervisor ratings on of supervisor ratings on of supervisor ratings on<br />
7-point rating scales) 7-point rating scales) 7-point rating scales)<br />
Figure 1. <strong>Testing</strong> instruments providing coverage of each job component, by<br />
test mode.<br />
Table 2<br />
Statistical Characteristics of Test Instruments Across Nine MOS<br />
Supervisory Comnon MOS-Specific<br />
Mean SD Rel. Mean SD Rel. Mean SD Rel.<br />
Situational Judgment Tests 1.37 .60 .75<br />
Job Knowledge Tests 65.4 12.5 .79 64.9 13.5 .73<br />
Supervisory Role-Plays 2.26 .42 .71<br />
Hands-On Tests 72.6 15.4 .46 69.4 19.5 .44<br />
Rating Scales - Army-Wide 4.49 1.06 .50 5.13 1.13 .48<br />
Rating Scales - MOS-Specific 5.19 0.97 .43<br />
Note. Situational judgment test results ranged from -.77 to 2.57 (thus the mean score of 1.37 is roughly<br />
equivalent to a score of 4.46 on a 7-point scale, with a standard deviation of 1.26): reliability estimate<br />
is split-half on items, corrected to test length.<br />
Job knowledge tests and hands-on test scores are proportions correct: reliability estimate for job knowledge<br />
tests is the median across MOS of a split-half on odd-even items, corrected to test length; the reliability<br />
estimate for hands-on tests is the median across MOS of the split-half on task scores, corrected to number<br />
of tasks.<br />
Ratings were made on a 7-point scale, where a 1 represents poor performance; reliability estimates are onerater<br />
reliabilities across dimensions, using the median across MOS for MOS-specific ratings.<br />
Role-play ratings were made on a 3-point scale, where a 1 represents less effective supervision; reliability<br />
eStim&eS are the median one-rater reliability across items, averaged across the three role-plays.<br />
544
Table 3 shows the intercorrelations among the instruments across the<br />
nine MOS. For the rating scales and, to a lesser degree, for the written<br />
tests, there are high correlations across the different job components. This<br />
may indicate that the test mode itself is responsible for much of the observed<br />
variance. Because the raters for a soldier were the same individuals for all<br />
three sets of scales, we would expect the results to be correlated; likewise,<br />
we expect scores on written multiple-choice tests to be correlated simply<br />
because of the cognitive processing burden imposed by the written material.<br />
The job sample tests, on the other hand, are less affected by the similarity<br />
of method, not surprising in view of the fact that nearly every job sample<br />
exercise (hands-on task test or role-play situation) is conducted and scored<br />
by a different administrator.<br />
Table 3<br />
Intercorrelations (Uncorrected) Among Test Modes (Written, Job Sample, and<br />
Rating Scales) and Job Components (Supervisory, Common Task, and MOS-Specific)<br />
WRITTEN MODE<br />
Supervisory (Situational Test)<br />
Cornnon Task (Job Knowledge Test)<br />
MOS-Specific (Job Knowledge Test)<br />
JOB SAMPLE MODE<br />
Supervisory (Role-Plays)<br />
Comnon Task (Hands-On Test)<br />
MOS-Specific (Hands-On Test)<br />
RATINGS MODE<br />
Supervisory (Army-Wide Ratings)<br />
Comn Task (Army-Wide Ratings)<br />
MOS-Specific (MOS Ratings)<br />
Written Mode Job Sample Mode<br />
Sup. Comn. MOS Sup. Comn. MOS<br />
1.00<br />
.40 1.00<br />
.34 .48 1.00<br />
.12 .19 .13 1.00<br />
.09 .30 .20 .I0 1.00<br />
.ll .23 .42 .06 .17 1.00<br />
.I7 .13 .13 -10 .08 .08 I..00<br />
Ratinqs Mode<br />
Sup. Comn. MOS<br />
.13 -12 .09 .07 -06 .09 .71 1.00<br />
.ll .15 .12 .05 .07 -09 .74 .64 1.00<br />
The correlations between different test modes measuring the same job<br />
components are highlighted in the table. The correlations between the two<br />
task-based instruments (job knowledge tests and hands-on tests) are relatively<br />
high even across the job components of common tasks and MOS-specific tasks.<br />
At the same time, the cognitive aspects of supervisory activities seem to be<br />
related to observed supervisory skill (ratings) to a greater degree than to<br />
job samples of supervisory behaviors. It appears that, for common and MOS<br />
tasks, knowins how to perform and beinq able to perform are more highly<br />
related than either of those is to actuallv performinq on the job. However,<br />
545
for the less easily defined and analyzed supervisory performance, knowinq<br />
effective wavs to supervise and beinq rated as a qood supervisor are more<br />
highly related than either of those is to demonstratinq supervisory skills on<br />
a role-play.<br />
Discussion<br />
Hands-on job sample tests and written job knowledge tests are frequently<br />
used in military performance measurement situations. Whenever we have welldefined<br />
tasks, with unequivocal task analyses that include the initiating cues<br />
and performance standards and that permit the identification of correct and<br />
incorrect actions, we can construct job knowledge tests or job sample tests.<br />
(Whether or not the tests are administrable within available or reasonable<br />
resources is another issue.) These types of tests are widely used because<br />
what they measure -- declarative and procedural knowledge, ability to<br />
perform -- is fairly well-understood. However, the assessment of "typical"<br />
performance (as opposed to ability or knowledge) is more difficult, and the<br />
use of anchored rating scales provides us a method that is arguably less<br />
precise -- but so is the target behavior less precise. Measurement of<br />
supervisory skills has long been regarded as difficult at best. Like<br />
"leadership," these skills are often referred to as "intangible," as though we<br />
are unsure of their existence, The situational judgment tests and the roleplays<br />
are, however, measuring something, and with a respectable degree of<br />
reliability. Continued attention to the development of these instruments, and<br />
to ways of assessing their dimensionality, should yield useful information to<br />
the military testing community.<br />
References<br />
Campbell, J. P., Ed. (October 1987). Improvinq the Selection, Classification,<br />
and Utilization of Army Enlisted Personnel: Annual Report, 1987 Fiscal<br />
Year (HumRRO Report IR-PRD-88-18). Alexandria, VA: Human Resources<br />
Research Organization.<br />
Campbell, J. P., Ed. (in preparation), Buildinq the Career Force: First<br />
Year Report. Alexandria, VA: Human Resources Research Organization.<br />
546
The USAF Occupational Measurement Squadron:<br />
Its Organization, Products, and Impact<br />
Joan T. Brooks<br />
William J. Carle<br />
Johnnie C. Harris<br />
Paul P. Stanley II<br />
Joseph S. Tartell<br />
USAF Occupational Measurement Squadron<br />
The- USAF Occupational Measurement Squadron (USAFOMS) represents the operational<br />
application of two major thrusts in industrial psychology in the Air<br />
Force : personnel testing and occupational analysis. Each of the USAFOMS’s<br />
four major programs reflects in its own way how these important technologies,<br />
which began as research efforts, have been applied to real-world problems to<br />
support Air Force mission accomplishment. Out of personnel testing grew the<br />
USAFOMS’s Occupational Test Development Program and the Professional Development<br />
Program. Out of occupational analysis grew the Occupational Analysis<br />
Program and the Training Development Services Program.<br />
A Brief History of the Squadron<br />
In 1970, the implementation of the Weighted Airman Promotion System (WAPS)<br />
triggered the establishment of a new organization within the headquarters of<br />
the Air Training Command (ATC), with the cryptic title of “Detachment 17.”<br />
Detachment 17 consisted of two branches, one responsible for test development,<br />
the other for occupational analysis. In 1974, the Air Force-wide<br />
impact of this organization's missions was recognized when it became the USAF<br />
Occupational Measurement Center. In October 1990, the unit, which is located<br />
at Randolph Air Force Base, Texas, was renamed the USAF Occupational Measurement<br />
Squadron. The USAFOMS Commander also sits on the staff of the Deputy<br />
Chief of Staff for Technical Training as the Director of Occupational Mea-<br />
surement.<br />
The Occupational Test Development Proqram<br />
In the 1950s and 196Os, pencil-and-paper tests were mainly used in training<br />
programs, to assess trainee progress. The implementation of WAPS, however,<br />
made tests a critical factor in enlisted career progression.<br />
The idea of WAPS was to take the mystery out of the promotion system by<br />
making every aspect visible to those competing for promotion. Under WAPS,<br />
airmen compete for promotion to the ranks of staff sergeant (E-51 through<br />
master sergeant (E-7) with other airmen in the same Air Force specialty (AFS)<br />
on the basis of a single score. This single WAPS score is the sum of six<br />
‘component measures (See Table 11, with USAFOMS tests accounting for up to 44%<br />
of the total. Most airmen take two tests: the Specialty Knowledge Test<br />
(SKI) measures knowledge of the Air Force specialty and the Promotion Fitness<br />
Examination (PFE) tests knowledge of general military subjects. Because the<br />
other, non-test factors typically do little to disperse promotion competi-<br />
tors, the SKT and PFE are often the deciding factor in determining who gets<br />
promoted.<br />
547
Table 1. Weighted Airman Promotion System Factors<br />
FACTOR<br />
SKT Score<br />
PFE Score<br />
Time in Service<br />
Time in Grade<br />
Enlisted Performance Ratings<br />
Awards and Decorations<br />
MAXIMUM PERCENTAGE<br />
POINTS VALUE<br />
100 22%<br />
100 22%<br />
40 9%<br />
60 13%<br />
135 29%<br />
25 5%<br />
TOTAL 460 100%<br />
Each promotion test is revised annually in order to prevent compromise and<br />
keep abreast of technological or procedural changes. The tests are con-<br />
structed using the content validity strategy of test development. Three<br />
sources of information are the foundation of content validity for the tests:<br />
the training standard, which lists the specialty's common duties and tasks;<br />
the occupational analysis data provided by USAFOMS’s own Occupational Analysis<br />
Program, which show the relative importance of tasks performed by job<br />
incumbents; and, most important, the experience and knowledge of the sub-<br />
ject-matter experts (SMEs) brought in to write the tests.<br />
The SMEs are senior NCOs selected from throughout the Air Force on the basis<br />
of their job experience in their respective career fields. "Tests Written by<br />
Airmen for Airmen" is a slogan which accurately sums up the USAFOMS test<br />
development philosophy, because these senior NCOs are the heart of the testwriting<br />
process. They provide the technical expertise and USAFOMS psychologists<br />
provide the psychometric expertise to produce job-relevant and statis-<br />
tically sound tests.<br />
While at USAFOMS, each group of SMEs is assigned a test psychologist to lead<br />
them through the test development process. A quality control psychologist<br />
acts as an additional set of eyes, performing exhaustive and minute scrutiny<br />
of all team output. A group of eight test management psychologists oversees<br />
the test development effort as a whole: identifying testing requirements for<br />
their assigned career fields, closely monitoring all events which may affect<br />
testing, ensuring that qualified SMEs are selected, providing guidance to<br />
test writers, and ultimately assuming overall responsibility for the tests<br />
developed.<br />
SMEs spend from 2 to 6 weeks at USAFOMS, depending on the type of test and<br />
the extent of test revision involved. During this time, their questions are<br />
thoroughly researched and reviewed. Each team member has veto power over<br />
each test item. After the SMEs leave, each test is subjected to an additional<br />
20 steps of quality control. The final product is a camera-ready test<br />
manuscript prepared with computerized photocomposition equipment. The manuscript<br />
is forwarded through the Air Force publications distribution system to<br />
be printed and disseminated worldwide through the network of Air Force test<br />
control officers.<br />
548
In addition to SKTs and PFEs, USAFOMS produces USAF Supervisory Examinations<br />
(USAFSEs) and Apprentice Knowledge Tests (AKTs). USAFSEs assess general<br />
supervisory and managerial knowledges and are used in the Senior NC0 Promotion<br />
Program, a board-based system used to make selections for promotion to<br />
the ranks of senior master sergeant (E-8) and chief master sergeant (E-9).<br />
AKTs measure the knowledge required for possession of the 3-skill level (also<br />
called apprentice level> of training. An airman with documented civilian<br />
experience in a specialty may be allowed to bypass resident technical training<br />
with a passing score on the AKT, thus saving the Air Force valuable<br />
training dollars.<br />
In 1989, 700 SMEs were sent TDY to USAFOMS to develop a total of 418 tests.<br />
The Professional Development Proqram<br />
This program, though not strictly an outgrowth of the field of industrial<br />
psychology like the others of USAFOMS, has had an important positive effect<br />
on acceptance of USAFOMS’s promotion tests. It is Air Force policy that<br />
promotion tests be developed entirely from references that will be available<br />
to all examinees for study. Before 1980, this was a probiem with USAFOMS’s<br />
most highly visible tests, the Promotion Fitness Examinations and USAF Supervisory<br />
Examinations. These tests were written from a variety of references<br />
which varied in quality and availability. The Professional Development<br />
Program was established to develop a single, high-quality reference upon<br />
which these critical promotion tests could be based.<br />
The reference which evolved was Air Force Pamphlet 50-34. Volume I of the<br />
pamphlet is now the sole source reference for airmen taking the Promotion<br />
Fitness Exam to compete for promotion to staff sergeant, technical sergeant,<br />
and master sergeant. Airmen competing for promotion to senior master ser-<br />
geant and chief master sergeant study both Volume I and Volume II in preparing<br />
to take the USAF Supervisory Exam.<br />
The Occupational Analysis Proqram<br />
In the early 196Os, research performed by the Air Force was to influence<br />
profoundly the field of industrial psychology. Occupational analysis had<br />
been around for many years in various forms, but it was the Comprehensive<br />
Occupational Analysis Data Analysis Programs (collectively called CODAP)<br />
developed by the Personnel Research Laboratory which made possible the study<br />
of jobs on the scale necessary to work with career fields the scope of those<br />
in the Air Force. In 1967, the Job Specialty Survey Division was formed to<br />
apply this technology in the operational setting. It was part of what was<br />
then called Lackland <strong>Military</strong> Training Squadron until Detachment 17 was<br />
formed in 1970.<br />
.People in the Occupational Analysis Program conduct surveys of AF personnel,<br />
both military and civilian, to learn what tasks they do regularly on the job.<br />
The Air Force uses the survey results for refining and maintaining occupational<br />
structures within a classi+ication system, for constructing enlisted<br />
promotion tests, for adjusting or establishing training programs, and for<br />
sustaining or modifying other Air Force personnel and research programs. The<br />
occupational survey process consists of six distinct phases, beginning with<br />
the receipt of a request for an occupational survey. Requests for surveys<br />
549
are reviewed by the Priorities Working Group (PWG). In addition to USAFOMS<br />
personnel, the PWG consists of representatives from the Air Force Deputy<br />
Chief of Staff for Personnel, the Air Force Human Resources Laboratory<br />
(AFHRL), the Air Force <strong>Military</strong> Personnel Center (AFMPC), and the ATC technical<br />
and medical training staffs. The PWG selects those specialties which<br />
will be surveyed and assigns relative priorities.<br />
The next step is the development of a job inventory. The job inventory<br />
consists of a comprehensive listing of tasks which may be performed in a<br />
particular occupational field. Inventory developers travel to operational<br />
bases as well as ATC technical training centers for exhaustive interviews<br />
with subject-matter experts. From these interviews, they compile the task<br />
listing and publish it along with background questions as the USAF Job Inventory<br />
for the occupational field under study.<br />
The job inventory is then administered to job incumbents, usually through the<br />
personnel office at each installation. The returned job inventory booklets<br />
undergo a quality control review to correct or eliminate those which have<br />
been improperly completed. Each booklet is reviewed for accuracy and com-<br />
pleteness. This careful quality control of the returned booklets ensures<br />
that the data received are accurate.<br />
Once the booklets are quality controlled, data processing personnel use an<br />
optical scanner to input task responses and background data from returned<br />
inventories into the computer. Computer programming personnel then apply<br />
CODAP programs to create job descriptions and other related products to aid<br />
in data analysis.<br />
Occupational analysts then spend considerable time analyzing the data and<br />
reporting significant trends and implications. USAFOMS publishes the find-<br />
ings and results of the analysis in the form of an Occupational Survey Report<br />
(OSR). The OSR and related data packages are made available to Air Staff,<br />
major commands (MAJCOMs), classification and training personnel, and other<br />
interested Air Force agencies.<br />
The critical final step in the occupational survey process involves working<br />
with the users to apply the data to their particular situation. During this<br />
step, the analyst introduces the user to the data products and gives specific<br />
guidance on how to use the data printouts in making decisions. Once the data<br />
have been analyzed and the OSR has been written and released, the data are<br />
used in a variety of ways. Classification personnel look at career field<br />
structuring, to validate the present structure or recommend restructuring.<br />
USAFOMS psychologists rely heavily on the data to establish the content<br />
validity of enlisted promotion tests. USAFOMS training analysts also use the<br />
data for systems analyses, task analyses, and assessment of education and<br />
training requirements. But, perhaps the most visible use of the OSR data to<br />
date is in determining training requirements. In today’s environment, where<br />
the training dollar is tight, training must be geared only to what the person<br />
will need to do the job effectively. In this regard, the emphasis today is<br />
placed on determining how job incumbents will be used in the first job assignment,<br />
identifying those tasks for which the probability of performance by<br />
airmen in their first assignment is high, and providing initial training on<br />
these tasks. OSR data are the key to designing initial courses that train<br />
550
only for the first job, as well as providing valuable information for what to<br />
include in follow-on training.<br />
The Traininq Development Services Proaram<br />
The Training Development Services Program was established in 1982 to improve<br />
Air Force training by using a systematic approach to training development.<br />
The program goal is to enable customers to provide "Quality Training for a<br />
Quality Force.” Training analysts are located at Randolph AFB and at each of<br />
the six technical training centers. Their primary function is to provide<br />
front-end task and training analysis to support Air Force instructional<br />
system development (ISD) requirements. The analysis focuses mainly on the<br />
second step of ISD, "Define Education and Training Requirements.'* The end<br />
result is an analysis of the training requirements of an Air Force specialty<br />
and a plan for structuringland integrating all training within that special-<br />
ty.<br />
The primary product of the Training Development Services Program is the<br />
Training Requirements Analysis (TRA). This document consists of three sec-<br />
tions:<br />
1) Systems Overview. This section provides the user with background<br />
information on the specialty with special emphasis on training needs and<br />
issues. This section lists all training presently available and points out<br />
anticipated changes within the career field such as the acquisition of new<br />
equipment. Data for this section comes from the Air Force <strong>Military</strong> Personnel<br />
Center, functional managers, training managers, and other staff-level organizations.<br />
2) Comprehensive Task Analysis. Each important task of the specialty is<br />
broken down into the skills and knowledge required to do the task. Also<br />
included are the tools, equipment, references, conditions, and performance<br />
standards for each task. This information is obtained through extensive<br />
one-on-one interviews with individuals who are fully qualified in the specialty.<br />
3) General and Specific Training Recommendations. The general recommen-<br />
dations relate to broad training issues such as the development of a new<br />
course or the merger of two or more specialties. On the other hand, specific<br />
recommendations are given task by task and describe where and when a task<br />
should be trained based on field data, task analysis data, and occupational<br />
survey data. The “where” is typically either at a technical training center<br />
or through on-the-job training. The “when” indicates whether a task should<br />
be taught during entry-level training or at a later time in a person’s ca-<br />
reer. Specific training recommendations are often produced in the form of a<br />
proposed Specialty Training Standard; however, the format is varied to meet<br />
‘the needs of the user.<br />
A TRA begins with a request from a specialty representative, usually at the<br />
Air Staff or MAJCOM level. TRAs may be requested in conjunction with an<br />
occupational survey (a product of the Occupational Analysis Program) or as a<br />
follow-on to a previous survey in order to address specific training issues<br />
and concerns. Approved TRAs are listed in the USAF Program Technical Training<br />
document . After approval, a team of training analysts is assigned the<br />
project. Usually analysts from more than one location will work on a<br />
551
project in order to reduce travel costs. Data gathering involves extensive<br />
interviews and observation of skilled specialty technicians. Analysts will<br />
draw from the experience of specialty instructors at technical training<br />
centers and will also travel to various MAJCOMs and bases that employ personnel<br />
in the specialty under study. Specific locations are determined through<br />
meetings with functional managers and will include enough bases to ensure a<br />
thorough sampling. Travel is confined to the continental United States<br />
unless unique specialty units are located overseas. The detailed task infor-<br />
mation gathered during these trips is collected with laptop computers and<br />
entered into automated files. The data then become the basis for the train-<br />
ing requirements summarized in the TRA.<br />
Analysts track the results of each TRA through an extensive external evalua-<br />
tion program that routinely surveys product recipients. User information<br />
received over the past two years indicates TRAs are extremely useful and<br />
serve many purposes. Enlisted personnel account for 80% of the users which<br />
is understandable since most analyses are conducted on enlisted specialties.<br />
Civilians account for 15% while officers account for the remaining 5%. TRAs<br />
are typically used to develop or revise OJT program.s, produce specialty<br />
training standards and criterion objectives, justify the procurement of<br />
training resources, and standardize training programs. TRAs have also been<br />
used to support career field mergers, determine cross utilization of training<br />
programs, and to support Utilization and Training Workshops (U&TWs).<br />
Conclusion<br />
USAFOMS programs impact on virtually every aspect of today’s Air Force:’<br />
Determining service entry criteria, setting aptitude requirements for occupational<br />
'specialties, establishing criteria for job-specific training programs,<br />
and providing the foundation for a fair and objective promotion system. The<br />
future holds new challenges as well, including the possibility of an on-line<br />
occupational information system which permits analysis across weapon systems,<br />
training information which supports both large- and small-scale programs<br />
which are job-specific and cost-effective, and continued improvement of the<br />
promotion system. Key to all future developments is the recognition that<br />
success has come from the operational application of research in industrial<br />
psychology and improvements must follow a similar track: research and validation<br />
prior to implementation.<br />
552
The Examirler is a sopilisticated computer-based system used irl the tlevelopn~et~t 0I’ t)(llll jJ:klJcr ilild IJellril<br />
and coniputer-delivered examinations. Over 200 imtallatiorls world-wide make LX ol tile system ill<br />
applicatiom rmgiiig from traditional classroom tests to the evaluation of sailors iii subni;lrilles al scii.<br />
Tliis paper will give a brief /ljs[ory of The Exmlincr, describe ttle SlrlKlU’e of IlIe sysIt’r~, a~~rl giw so111e<br />
sugestetl irllplenleri(atit,ra.<br />
‘The Evolulio~~ of The Examiner<br />
A number Of expensive ami complex mairirrame testing syslerns existed ii1 1984 ~Vllell III. SI:Uilt‘y Trolli[) ;Illrl<br />
I tlecidecl to put our development and programming experience to work to create a rl:icrocolllputcr IGLWYI<br />
testing syslern. \Ve developed a small prototype system i111d sl~~wetl it to a 11unlber ul’ piospc’c:i*+e cu~tou~r~~<br />
in hope of receiving funding to develop it inlo a full-fleclgcd program.<br />
TIumq$ some hlsilless quaintances in Ellglalid, we leat~lletl tllat tile LOII~IOII Shtk Excll~~ge wit.5<br />
or’el hulirlg their centuries-old brokerage system arid were rnot.irlg towards ;I cerril’itxl repl.eseltl;l~i~~ \\ \ICIII<br />
similar IO ullat we Ilave here in the United States. As part ol’ tllis ~II~~~SS 111e Excl~;ulge dccirl?d III~I IIIC~<br />
rvantecl a comprehlsi~,e computer-based teslhg syslem tlereloped lo 111ee1 h+ crrtilic;llioIl rret!dr. i’llr<br />
design criteria tlq specified were:<br />
The system iiad to be secure. Itell \);ulk ellcryptioll a11d tlatal~;r!:tS ~;LGW~I CI ;IC‘CCS\ II;III IO C’IISUI e<br />
lliat the test items did liot “escape”.<br />
The system had to be reliable. Accuracy in testing was important for its ow11 sake. I-her ~egul;itions<br />
in the UK made it essential that there be IIO errors in recording amwers 01 I~~OI 1i11g gr&s 01<br />
examinees.<br />
T!ie system had lo be easy to use. From a clevelop~llelll staiidpt~iiil, IlIt! s~slrlll ~11011l!./ IW LXYil!<br />
operated by clerical staff. From tlie examillee sta~icipc~iilt. COIIlpU[t’~~ ilcopll~le?; rrlml 1101 Iiriil 111~<br />
sol‘twnre impairing their test-lakiiig ability in ally \v;ly.<br />
Dr. Trollip ar~tl I conviricetl the ExchaJlge ~llat we could develop ;I sul’t~u e p~~ll~cl 1’01 lllt2lil 111~11 LCltlltl IllttCl<br />
their ~lcetls, a11d that we could develop it for them on budget rultl in time for rileit “Big I3Ulg” c!et C~Uliili0il ill<br />
tile Fall of 1985. We succeeded, and the software product The Exanlihr ws l’irst ;v;s used iii Octoi,er 01’<br />
1985. Since ~hcn, every stockbroker in tile United Kingdom and Ireland has kc11 CC‘I lil’icd using (1i11 S~SIVIII.<br />
Over Ic)OO tests a year ha\.e been given.<br />
Produce h0lh ctlnlpUlel-deli~e~.~~j alltl lraditional paper ;irld pelril 1esl.r II 0111 ;i hillgle Cl:Ilal~;i\~<br />
Provide a Il~aIllc~o~k tllkiI rilI1 p~ocluce simple spoI quizzes 01 corltplex ~Iu:diliv.ilioll Ic\ls.<br />
Provide item al’ld test l’eetlback allowing the htegratioll ol’ trainin,0 iill illc ~:i~;lill:lIl~l! 111 o~~kb;.<br />
Track item statistics for lhe improvement of item tmlk quality.<br />
Track examince statistics for irlriividual and clans reporting purposes.<br />
553
_____ - . . . ..- -_-.---. --.-. -~----._----_.-.- --~__<br />
Item Editor The item bank is created in this part of the system. Tile structure ol’ 111e rlala~xw is<br />
developed to allow for accurate testing of different subject areas.<br />
Exam Editor Once the item bank has been created, the examination editor is used to create “pr~l’iie~” IIMI<br />
will be trsed a5 telI~~)lilleS to create tesls.<br />
Exam Delivery Tests are delivered in a secure environruetlt with a user iMerf;tce designed to allow tIlti<br />
assessment of examinee knowledge rather than test-taking ability. Paper and pencil tests ale<br />
cleanly printed for maximum clarity and legibility.<br />
Statistics Completed examination records can be viewed for both examinee results and item allalysis.<br />
Tile Ilem Bank Structure<br />
0~ of the great powers of The Exaalincr is that the software naturally leads tl~e rle~loprr inlo ol,giilli/.illg<br />
item into a logically-structured item banks. This ratiomlly structured item bar~k illl(~wS (1~ c~lrlstruclion 01<br />
tests that can evaluate specific learning objectives or bloacl knowledge are;t% Examiner
The Examiner<br />
Multiple Choice<br />
Multiple Correct<br />
Dynamic Multiple<br />
Choice<br />
Dual<br />
Short Alpha Answer<br />
Short Numeric Answer<br />
Linked<br />
Parallel<br />
Up lo 10 aJlern;Ilives are available, each alternative IIaviIIg its O\VII gl.illlillg<br />
weight with item mastery based 011 achieving a set SUIII ol c~rrecI dIr~IIilIi\es.<br />
Up to 10 alternatives are available. Ilenis are coIulrurletl \I~ll~\IlliCilll~ ;II<br />
exaIIIiII;Ilion geIIeralioII lime by selecting a preselected ~turuber ol’ WI reck ;IIKI<br />
incorrect allerIIalive%<br />
A special Corm of multiple choice item created al examination getteration litlIe<br />
from a set of four alternatives, two correct and two incorrect.<br />
Up to ten words can be judged at one lime. Misspellings, extra words, irtcvrrect<br />
word order, and capilalizalion errors can be allowed or tlisallouetl at will.<br />
A floating point number can be judged. Exact matching or plus ;uId mittus a11<br />
absolute number or percentage of error can be allowed.<br />
Up to 99 items can be linked, together ittto a “scenario” type of item. Mastery<br />
of the linked item can be based 011 mastery ol all or [‘ill t ol’ lltr ittclutled items.<br />
Up to 99 items can be grouped together as a parallel item. At examimrtiott<br />
generatioti the, lhe sysleni will iwidomly select 01) tlie px;illel items 101.<br />
inclusion in the examinatiun.<br />
The means of examination delivery will ol’ten determine the type of questions tlmt 211 e IO be used.<br />
If computer-delivered or manually-graded paper and pencil examination ;II r’ LO be used, then any of tl~ese<br />
item types can be used. If machine-graded paper and pencil examinatiotts are anticipated, tlten multiple<br />
choice is the usual choice.<br />
Item Entry<br />
An integrated word-processor allows clerical staff to enter items into The Examiner. A judicious use oI’ preformatting<br />
enables The Examiner editor lo produce correclly enleretl items every time.<br />
If an item bank is already in existence, a Taz hport r/rility can be used to load items ~IIIO att Examiner<br />
database. Mai&ame-based item banks can be successfully migrated into the Exanrirrcr’s ertvirotunertt \r.i~ll in<br />
minimum of expensive “hands-on” intervention.<br />
Examination Develonmenl<br />
Once an item bank has been created, the developer can create examinations from dl OI’ poll 1 ol tl1r 1Wlk.<br />
Item selection can range from the selection ot’ specific ilems lo m~clorn seleclioIi.<br />
The Examiner is unique in its use of projifes. A profile is a set of directives that tells llie Examiner testgenerating<br />
software how to extract items out of the database and create an exarnittaliott. The Examiner<br />
doesn’t store tesfs. Rather, it stores profiles that allow tests to be created otr-demand by accessing 1l1e<br />
profile. This mikes Examiner databases totally self-coIllaiIIed, allowing the crealioII of urIiqtIe, ye1<br />
equivalenl, examinalions at any lime.<br />
Profiles contain two main sets 01 specificatiotu:<br />
555
The Examiner -<br />
Global Specilicatiolls These are sets of parameters that effect things such as the number 01 items fo be<br />
shown in the examination, the pass mark for the examhation, the dilliculty of the<br />
examination, and the way that the examimatioll is to appear lo llle cxaulillre.<br />
selection and random presentation of multiple-choice alternatives. Al the other elltl, the prolile for a<br />
complex certilication examination can be created that will yield a test of a given dilliculty level to test very ,<br />
specific areaS of knowledge.<br />
All Questions in Database<br />
/- I ------s_-----.--- -__- ---. -- . ..-.(<br />
History (t)<br />
Gyu$-v b)<br />
1 .o.o<br />
. .<br />
I!<br />
,~-?--.-Ji<br />
- -<br />
Pdople (t) States (*)<br />
1.1.0 1.2.0<br />
- -_-.::;7:...;2-.=<br />
/I<br />
,-===-:“:..- ;<br />
_ -:<br />
Cities(Z) Seas Isilands(+)<br />
2.1.0 2.2.0 2.3.0<br />
I’<br />
A------ ii<br />
A,/<br />
. . I ._- .-.- _-. _! I<br />
f-------11 I<br />
--====Tr------I T-- 1<br />
Q - 4<br />
.'..<br />
Q Q Q s Q<br />
1.1.1 1.1.2 121 . . 122 . . 2.L- ‘: 2 2.L 2.2.1 2.2.2 2.3.1<br />
The above illustration gives an example of the type of sophisticaled iteril scleclioll crilel ia Illat calI bc us~trl ill<br />
Examiner profiles. In this example, the profile has been designed so that a six item lest will be gellel~i\letl.<br />
The item selection criteria have been set so that:<br />
1)<br />
2)<br />
3)<br />
All examinees will get item 1.1.1.<br />
Two items will be selected from area 2.1.0. In this case, the random selection jMX)ceSS Select4 2.12<br />
and 2.1.3. It could just as well have be&m two items from 2.1.0.<br />
Everyone will get item 2.3.1.<br />
4) The rest ol the test will be completed with items I‘rom 1.2.0. 111 Illis case, items 1.2.1 arid 12.2 \vere<br />
selected.<br />
in addition to specifying item selection by objective clzsilication, protiles WI S[JeCify Se~ecIiorl IIy tlil’l’icully,<br />
item type, and item characteristic. The Exaltliner will attempt to produce a test lllat w close& r11Nci1e~ III~<br />
requested characteristics as possible.<br />
Examination Delivery<br />
‘l‘hC ikmirler is unique in its ability to produce both cool[)uter-cleliver~~l a11tl p;ll)t’t ;IIKI pencil c.~;\llliil;llioll.s<br />
from the same item bank. The developer is given the ability to select the delivery 1110tle 11~1 is tl~ 111051<br />
appropriate for their testing needs.<br />
556
Paner and Pencil <strong>Testing</strong><br />
The Examiner can easily produce multiple forms of paper and pencil tests. By the illtroduction of r~u~rlom<br />
selection options into the profile used to generate the test, a user cm produce two statistically ecluivalerrt<br />
tests on the same subject area. If desired, a unique test could be generated for each examirree. Pil1)e.r ant1<br />
pencil tests can be scored in three Lshions:<br />
Manually Student and instructor’s answer keys can be printed wit11 eaclt test. In manly<br />
instances, hand grading using these forrrls is acceptable.<br />
Stand-Alone Machine<br />
Graded<br />
A “pre-slugged” answer sheet compatible with tile sland-able Scalilraii 888<br />
optical scalner can be produced nn an IIP Lx?er.let II printer. This allows the<br />
easy scoring of mulliple examinations on readily available hardware.<br />
Data-Terminal Scanning ~11 allswer hey is stored internally in The Examiner’s database ant1 can be used<br />
to grade examinee answer sheets. Complete examiriation and item stalislics 111 e<br />
stored when data terminal scanners are used. At p~se~ll, he SC~IIUOII<br />
1300/1400 series scanners are supported. In 1st Quarter 1391, support (01 tile<br />
Scantron 8000/8200 series scanners will be added.<br />
Print options of The Examiner ae currently being elkulceti, and these new featur-es will be r~lc;t.setl irl tl~e<br />
first quarter of 1991. Supported features will be:<br />
PriNers hitid SUppOrt IOf 20 Ol‘dle IllOSt COr1111~01~ pI’illkrs. on cu.slolllel‘ I’qllesl, Ilk! 1” illlcr~<br />
support iiies will be expanded to include atlriilional prillters.<br />
Fonfs Depending OII the printer, font control will be added. With lile l-1 1’ I-;iserJel prirltt?,<br />
numerous font cahdges will be supported in addition to liie dei’auil illlet 11ai I’OII~S.<br />
Highlighting Bold, italic, underlining, superscripts, and subscripts will be available on Ixinlers tl\a~<br />
support liiose features.<br />
Graphics Primed items will include PC-Paintbrusll’” images that are currelitly 01lly available witii<br />
computer-delivered examinations. Pririring will be limited to tllosr pr~irrlers Iilal SUppul’l<br />
graphic printing.<br />
Sr~ggeslcd Implementations<br />
Paper and pencil tests can be made available to examinees under a Ilumber of tliI’l’ereIlI delivery ellvirollmerits.<br />
Tile “unbundlirlg” of p,arrs of Tire Examiner makes it possible 10 Ilave ~lol~-lecl~llical clcric~~l s~l’l<br />
produce on-demand tests at remote sites. Three possible test creation/delivery sceri;irios WC:<br />
Local Control Tests are created using the main Examiner system and are gritlieti at the rlevel~pme~~t sile.<br />
Copies of the item bank remain secure in one place and test creation WWOI is tiglrlly<br />
limited. Tests cal be mailed to remote sites and tilell returned to tile cetl~ral sile ~OI<br />
grading.<br />
Networked Using the network versioll of the stand-alone examillatiorl generalor. wniole sires c;in loc;~ll~<br />
generate tests and score Illem. Witllout access to (lie eclitillg 13io~;~‘~ll~Ih, IlIe s;ccclriry of’ IlIt!<br />
database is maintained wicle provicliq tile collvenience ol’ simuil:uleous multiple xwss to<br />
the items.<br />
557
.<br />
Remote Copies of rhe database are distributed to the remte sites, 3lld the St~lIld-~l0lle ex~IlJliIliili~Jll<br />
generator is used to produce the tests. Grading is done locally.<br />
Of course, numerous variations on these biUiC Iliemes are possible lo ol’~ef~ a clclik,ery cII~ir.oIIItIrIIt<br />
appropriate for the unique delivery requirements.<br />
Comouter-Delivered Testinp<br />
Examinations can be delivered via computer using The Examiner’s administration software. Sa~nplc<br />
examinations allow the examinee lo become fanlihr With the testing software so thal the at~IlliIlistralioIl<br />
system tests knowledge rather than computer test-taking ability.<br />
Basic options available within the administration system are:<br />
Sequencing Examinees can be required to answer each item before they see Ilie next item, and are IlOt<br />
allowed KJ review and change their items. Or, exanIiIJees calI move willIiII the exaIIJiIIali~~J1<br />
at will, chlgirlg their answers tlrltil complete examination scoriIlg is rquesIcd.<br />
Feedback Real-titlle student mastery feedback cm rarlge from 11one at all to tlrtitiled Itedback ill ille<br />
muItiple-choice alterliarive level. Full-test mastery crileria a~~tl r.esrllls ciitl be aclivaleil wllell<br />
appropriate.<br />
Randomization Item presentation order can be random or lixed. Within items, mulliple choice alterIIaIive<br />
order can be randomized.<br />
Examiner tests are DOS fries and car1 be moved f~‘orn one nlachhe to anoliier by ilIly Ilumber of lllt!thl~c~S<br />
Floppy disks, local area networks, and distributed cornlnullicaliol1 networks are all p
The Exanliwr is ii sopllis[ic;l(etj conlpu~er-l)~t~etl examin;ltion system IIlirl GUI mrcl IIIOSI ltdiilg I\cch (11<br />
bodl large and small organizations. Its ilbilily 10 deliver l.Wlil pilp” illltl pcrlcil iIllCl ~~~llll~~ll~~~l~il.S~ll<br />
examinations from tile sane &i&are give it a unique power in arl areai where lesliilg lltxcls ciul soillclilnri<br />
change with great rapidity. ~v&ing to meet user’s needs, The Examiner is a cosl-efleclive ol’f-tile-sl~zll’<br />
solutiori for the evaluahri of examirlees arld students.<br />
For informntioii corilacl:<br />
Media Compuler Enrerprises, Ltd.<br />
880 Sibley Memorial Highway, Suite 102<br />
Mendola Heights, MN 55118-1708 USA<br />
Phone: 612-451-7360<br />
FAX: 612-451-6563<br />
559
. . ..--.-_---_- .-_<br />
32ND ANNUAL CONFERENCE OF THE MILITARY TESTING ASSOCIATION<br />
ORANGE BEACH, ALABAMA<br />
5-9 NOVEMBER 1990<br />
Minutes of the Steering Committee Meeting<br />
5 November 1990<br />
The meeting of the Steering Committee for the 32nd Annual<br />
Conference of the <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong> was held In the<br />
Sand Castle (1) Room of the Perdido Hilton Hotel, Orange Beach,<br />
Alabama,<br />
MEMBERS AND ATTENDEES. S,ee the List of Steering Committee<br />
Meeting Attendees, which follows these minutes as Attachment 1.<br />
1. The meeting was called to order at 0930 hours by CDR M. R.<br />
Adams, 1990 Chairperson.<br />
2. The financial report reflected a sharp impact from the recent<br />
economic budget problems: last year over 350 people were<br />
registered and for 1990, 300 were expected. Because of the<br />
sizeable funds passed to NETPMSA from the 1989 hosts and the<br />
expected number of attendees, the registration costs were<br />
substantially reduced. There have been many cancellations and<br />
the current estimate of registered attendees Is 140. (As of 9<br />
November, there were 165 registered attendees.)<br />
3. Future conference locations were discussed:<br />
(a) The Federal Republic of Germany received a letter signed<br />
by OASD (FM&P) stating that support for U.S. participation at the<br />
'91 MTA Conference would be given. However, NETPMSA, as the '90<br />
MTA host, recommended review of the next site due to the budget<br />
difficulties encountered and the financial planning problems for<br />
the host. Several discussions followed regarding funding<br />
cutbacks expected and difficulties In promising good attendance<br />
numbers In Germany. Until the testing/research budgets finalize,<br />
it was agreed to defer Germany as the quest host, A Spring<br />
versus Fall time period was discussed also but the qroup decided<br />
to leave the conference as an expected Fall budget Item. even if<br />
attendance went down. The USAF Occupational Measurement Squadron<br />
tentatively agreed to host the 1991 MTA Conference in San<br />
Antonio, Texas, site of the 1989 conference. (That 1991 site was<br />
confirmed on 6 November 1990).<br />
(b) The Navy Personnel Research and Development Center will<br />
host the 1992 MTA Conference in San Diego, California.<br />
560
(c) The Coast Guard will host the 1993 MTA Conference in<br />
Williamsburg, Virginia.<br />
(d) The Federal Republic of Germany will host the 1994 MTA<br />
Conference in Germany in conjunction with Naval Research in<br />
London, England,<br />
(e) Canada will host the 1995 MTA Conference.<br />
4. There was general discussion on the submission of abstracts<br />
and the difficulty In getting them in a timely way. Many members<br />
felt the Steering Committee members should be more forceful in<br />
the association and possibly require a committee member screen on<br />
presentations. This would assist with quality and timeliness.<br />
Some members felt there should be greater recruitment for topics<br />
from the production/development areas since research is already<br />
so well represented.<br />
5. Regarding the Harry H. Greer Award, there was discussion<br />
about an Awards Committee being established, as mentioned in the<br />
charter, to provide more structure and coverage in getting more<br />
good nominations. The general opinion was that the current<br />
method of presenting nominations to the current chairman for<br />
further opinion is sufficient. However, the committee members<br />
all agreed that nominations should be specific In detail<br />
regarding the currency and degree of the nominee's involvement<br />
with the <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong>: professional contributions<br />
in research/production; published material, etc.<br />
6. There was general agreement that the 1989 carry-over topic of<br />
"MTA name change" should be dropped from future MTA Steering<br />
Committee meeting agendas. This has been a repeated item and the<br />
historic continuity value of the current title is most important.<br />
M. R. ADAMS, CDR, USN<br />
1990 Chairperson<br />
561
__-.-_ ___rr_l .__-__ __^__ .__...__.. - .___..____-.__._ _ _-.. . _<br />
Canadian Forces Personnel Applied<br />
Research Unit<br />
National Defence Headquarters<br />
Canadian Forces Directorate of<br />
<strong>Military</strong> Occupational Structures<br />
Federal Ministry of Defense<br />
Federal Republic of Germany<br />
MOD Science 3 (AIR)<br />
Ministry of Defence<br />
United Kingdom<br />
Royal Australian Air Force<br />
Royal Netherlands Army<br />
SEC PSY OND/CRS<br />
Belgian Armed Forces<br />
Naval Education and Training Program<br />
Management Support Activity (NETPMSA)<br />
Naval <strong>Military</strong> Personnel Command<br />
Navy Occupational Development and<br />
Analysis Center (NODAC)<br />
Navy Personnel Research and<br />
Development Center (NPRDC)<br />
Defense Activity for Non-Traditional<br />
Education Support (DANTES)<br />
U.S. Air Force Human Resources Laboratory<br />
U.S. Air Force Occupational Measurement<br />
Squadron<br />
U,S. Army Research Institute (PERI-RG)<br />
U.S. Coast Guard Headquarters (G-PWP-2)<br />
OBSERVERS:<br />
Air Traffic Services Transport Canada<br />
Chief of Naval Operations<br />
1990 MTA STEERING COMMITTEE MEETING ATTENDEES<br />
562<br />
_ .._._ _-. _-. __.. .- __._ ---_-.__ __,. _ _. . _. _<br />
CDR Frederick F.P. Wilson<br />
COL James C. Fleming<br />
Mr. G. J. (Jeffi Higgs<br />
COL Terry J. Prociuk<br />
Martin L. Rauch<br />
(Represented by<br />
LTCOL John Blrkbeck,<br />
MOD A ED 4)<br />
Squadron Leader John S. Price<br />
COL Dr. Ger J.C. Roozendaal<br />
CAPT Francois J.M.E. Lescreve<br />
CDR Mary R. Adams<br />
CAPT Edward L. Naro<br />
Dr. Alain Hunter<br />
Mr. William A. Sands<br />
Roger G. Goldberg<br />
Dr. Lloyd D, Burtch<br />
J. S. Tartell<br />
Dr. Timothy W. Elig<br />
Richard S. Lanterman<br />
J. R. Dick Campbell<br />
Mr. Charles R. Hoshaw<br />
Attachment 1
ORGANIZATION<br />
ROYAL WSTRALIAR AIR FORCE:<br />
Royal Australian Air Force<br />
U.S. Air Force Human Resources<br />
Laboratory (AFHRLIMOD)<br />
Brooks AFB, TX 78235-5000<br />
U S A<br />
A/V 240-3640 COM: (512) 536-3648<br />
BELGIAN ARMED FORCES,<br />
SEC PSY OND/CRS:<br />
SEC PSY OND/CRS<br />
Bruynstraat<br />
B-1120 Brussels<br />
Belgium<br />
2 2680050, Ext. 3279<br />
CAK4DIA.N FORCES DIRECTORATE OF<br />
MILITARY OCCUPATIONAL STRUCTURES:<br />
Canadian Forces Directorate of<br />
<strong>Military</strong> Occupational Structures<br />
National Defence Headquarters<br />
101 Colonel By Drive<br />
Ottawa, Ontario KlA OK2<br />
Canada<br />
MILITARY TESTING ASSOCIATION<br />
STEERING COMMITTEE MEHBERS<br />
AUSTRALIA<br />
BELGIUM<br />
CANADA<br />
563<br />
1990 REPRESENTATIVE<br />
Squadron Leader John S. Price<br />
(Squadron Leader Kerry J. McDonald w<br />
be 1991 representative)<br />
CAPT Francois J.M.E. Lescreve<br />
COL James C. Fleming<br />
(A/V 642-3507 COM: (613) 992-3507)<br />
Mr. G. J, (Jeff) Higgs<br />
(A/V 842-7069 COM: (613) 922-7069)
--,..--c__<br />
____ _____....__.____ - .-._ _.._ _-__--.l-- _--.. -- . ..-- ~. ..-..<br />
.--<br />
--- ---<br />
--~-----<br />
._~ _.... ~- _.._.~ _.<br />
ORGANIZATION 1990 REPRESENTATIVE<br />
DIRECTOR OF PERSONNEL PSYCHOLOGY AND SOCIOLOGY:<br />
Director of Personnel Psychology and<br />
Sociology<br />
National Defense Headquarters<br />
Ottawa, Ontario KlA OK2<br />
Canada<br />
Canadian Forces<br />
A/V 842-0244 COM: (613) 992-0244<br />
CANADIAN FORCES PERSONNEL APPLIED RESEARCH UNIT:<br />
Canadian Forces Personnel Applied<br />
Research Unit<br />
4900 Yonge St., Suite 600<br />
Willowdale, Ontario M2N2Z4<br />
Canada<br />
Canadian Forces<br />
COM: (416) 224-4964<br />
FEDERAL NINISTRY OF DEFENSE:<br />
COL Terry J. Prociuk<br />
CDR Frederick F.P. Wilson<br />
FEDERAL REPUBLIC OF GERMANY<br />
Federal Ministry of Defense, P II 4<br />
Postfach ‘1328<br />
5300 Bonn 1<br />
Federal Republic of Germany<br />
Federal Ministry of Defense<br />
COM: 49-228-128543<br />
FEDERAL REPUBLIC OF GERMANY AIR FORCE:<br />
Federal Republic of Germany Air Force<br />
Wehrbereichsverwaltung II<br />
V-4-Psychology Angelegenheiten<br />
Hans-Blocher-Alee 16 3000 Hannover<br />
05 11-531-26 08126 03<br />
564<br />
Martin L. Rauch<br />
Wolfgang Weber<br />
.-. .
ORGANIZATION 1990 REPRESENTATIVE<br />
ROYAL NETHERLANDS ARMY:<br />
Royal Netherlands Army<br />
DPKL/AFD GW<br />
Postbus 90701<br />
2509 LS The Haque<br />
The Netherlands<br />
COM: 31-71-6135450<br />
MOD SCIENCE 3 (AIR):<br />
MOD Science 3 (AIR)<br />
Lacon House<br />
Theobalds Road<br />
London, WCIX 8RY<br />
England<br />
U.S. AIR FORCE<br />
THE NETHERLANDS<br />
L<br />
THE UNITED KINGDOM<br />
COL Dr. Ger J.C. Roozendaal<br />
Eugene F. Burke<br />
(Represented in 1990 by<br />
COL John Birkbeck<br />
MOD A ED 4<br />
Court Road<br />
Eltham<br />
London SE9 5NR<br />
United Kingdom)<br />
UNITED STATES OF AMERICA<br />
U.S. AIR FORCE HUMAN RESOURCES LABORATORY<br />
(AFHRL):<br />
U.S. Air Force Human Resources Laboratory<br />
(AFHRL/PR)<br />
Brooks AFB, TX 78235-5601<br />
USA<br />
(A/V 240-3011 COM: (512) 536-3611)<br />
U.S. Air Force Human Resources Laboratory<br />
(AFHRLICC)<br />
Brooks AFB, TX 78235-5000<br />
USA<br />
565<br />
Dr. Lloyd D. Burtch<br />
COL Harold G. Jensen
-.e-..- -_-._.- -I-..-_ __..,......_.....<br />
..-. ---- ---<br />
ORGANIZATION 1990 REPRESENTATIVE<br />
U.S. AIR FORCE OCCUPATIONAL MEASUREMENT SQUADRON<br />
(OMY):<br />
U.S. Air Force Occupational<br />
Measurement Squadron (OMY)<br />
Randolph AFB, TX 78150-5000<br />
USA<br />
DSN 487-6623 COM: (512) 652-6623<br />
U.S. ARMY RESEARCH INSTITUTE (PERI-RG):<br />
U.S. Army Research Institute (PERI-RG)<br />
5001 Eisenhower Avenue<br />
Alexandria, VA 22333-5600<br />
USA<br />
A/V 354-5786 COM: (703) 274-5610<br />
U.S. COAST GUARD<br />
U.S. COAST GUARD HEADQUARTERS:<br />
U.S. Coast Guard Headquarters<br />
Chief, Occupational Standards<br />
(G-PWP-21, Room 4111<br />
2100 Second St., S.W.<br />
Washington, DC 20593-0001<br />
IJSA<br />
COM: (202) 267-2986<br />
U,S, NAVY<br />
NAVAL EDUCATION AND TRAINING PROGRAM<br />
MANAGENENT SUPPORT ACTIVITY (NETPHSA):<br />
Naval Education and Training Program<br />
Management Support Activity (NETPMSA)<br />
(Code 03)<br />
Pensacola, FL 32509-5000<br />
USA<br />
A/V 922-1685 COM: (904) 452-1685<br />
.._r- -.._-“_.----~~~~r.-._ .c--.<br />
566<br />
J S. Tartell<br />
Dr. Timothy W. Ellq<br />
Richard S. Lanterman<br />
CDR Mary Adams<br />
Dr. James M. Lent2<br />
. _.__.. _ .~ --
ORGANIZATION<br />
NAVAL HILITARY PERSONNEL CONHAND NAVY OCCUPATIONAL<br />
DEVELOPHENT AND ANALYSIS CENTER (NODAC):<br />
Naval <strong>Military</strong> Personnel Command<br />
Navy Occupational Development and<br />
Analysis Center (NODAC)<br />
Bldg. 150, Wny, (Anacostia)<br />
Washington, DC 20374-1501<br />
USA<br />
NAVY PERSONNEL RESEARCH AND DEVELOPNENT CENTER<br />
(NPRDC):<br />
Navy Personnel Research and<br />
Development Center (NPRDC)<br />
<strong>Testing</strong> Systems Department (Code 13)<br />
San Diego, CA 92152-6800<br />
USA<br />
A/V 553-9266 COM: (619) 553-9266<br />
DEFENSE ACTIVITY FOR NON-TRADITIONAL EDUCATION<br />
SUPPORT (DAMTES):<br />
Defense Activity for Non-Traditional<br />
Education Support (DAMTES)<br />
Pensacola, FL 32509-7400<br />
USA<br />
A/V 922-106411745 COM: 904-452-1063<br />
OFFICE OF ASSISTANT SECRETARY OF DEFENSE FORCE<br />
MANAGEMENT AND PERSONNEL (FN&P):<br />
Office of Assistant Secretary of Defense<br />
Force Management and Personnel (FM&P)<br />
Washington, DC 20301<br />
USA<br />
A/V 227-4166 COM: (202) 697-4166<br />
567<br />
-_<br />
1990 REPRESENTATIVE<br />
CAPT Edward L. Naro<br />
(A/V 288-5488 COM: (202) 433-5488)<br />
Dr. Alain Hunter<br />
(A/V 288-4620 COM: (202) 433-4620)<br />
Mr. William A. Sands<br />
Roger G. Goldberg<br />
Dr. W. S. Sellman
_-----_<br />
.- ___I_._-____ .I_..__.__ - -.-. I . ..-..-.. -.<br />
HY-LAH’SOFTHE RlII.IT.4KY TklS’l’INC ASSOCIATION<br />
Article I - Name<br />
The name of this organization shall be the <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong>.<br />
Article Ii - Purpose<br />
The Purpose of this <strong>Association</strong> shall be to:<br />
A. Assemble representatives of the various armed services of the United States<br />
and such other nations as might request to discuss and exchange ideas concerning<br />
assessment of military personnel.<br />
B. Review, study, and discuss the mission, organization, operations. and research<br />
activities of the various associated organizations engaged in military personnel<br />
assessment.<br />
C. Foster improved personnel assessment through exploration and presentation<br />
of new techniques and procedures for behavioral measurement, occupational<br />
analysis, manpower analysis, simulation modeIs, training programs, selection<br />
methodology, survey and feedback systems.<br />
D. Promote cooperation in the exchange of assessment procedures, techniques<br />
and instruments.<br />
E. Promote the assessment of military personnel as a scientific adjunct to<br />
modern military personnel management within the military and professional<br />
community.<br />
Article I1 I - Participation<br />
The following categories shall constitute membership within the MTA:<br />
A. Primary Membership.<br />
1. All active duty military and civilian personnel permanently assigned to an<br />
agency of the associated armed services having primary responsibility for assessment<br />
for personnel systems.<br />
2. All civilian and active duty military personnel permanently assigned to an<br />
organization exercising direct command over an agency of the associated armed<br />
services holding primary responsibility for assessment ofmilitary personnel.<br />
B. Associate Membership.<br />
1. Membership in this category will be extended to permanent personnel of<br />
governmental organizations engaged in activities that parallel those of the primary<br />
membership. Associate members shall be entitled to all privileges of primary<br />
members with the exception of membership on the Steering Committee. This<br />
restriction may be waived by the majority vote of the Steering Cornmi ttee.<br />
568
C. Non-Member Participants.<br />
1. Non-members may participate in the annual conference. present papers<br />
and participate in symposium ‘panel sessions. Xon-members will not attend the<br />
meeting of the Steerrng Committee nor have a vote in association affairs.<br />
Article IV - Ilues<br />
No annual dues shall be levied against the participants.<br />
Article V - Steering Committee<br />
A. The governing body of the <strong>Association</strong> shall be the Steering Committee. The<br />
Steering Committee shall consist of voting and non-voting members. I’oting<br />
members are primary members of the Steering Committee. Primary membership<br />
shall include:<br />
1. The Commanding Officers of the respective agencies of the armed services<br />
exercising responsibility for personnel assessment programs.<br />
2. The ranking civilian professional employees of the respective agencies of<br />
the armed service exercising primary responsibility for the conduct of personnel<br />
assessment syst.ems.<br />
3. Each agency shall have no more than two (2) representatives.<br />
B. Associate membership of the Steering Committee shall be extended by<br />
majority vote of the committee to representatives of various governmental organizations<br />
whose purposes parallel those of the <strong>Association</strong>.<br />
C. The Chairman of the Steering Committee shall be appointed by the President<br />
of the <strong>Association</strong>. The term of office shall be one year and shall begin the last day of<br />
the annual conference.<br />
D. The Steering Committee shall have general supervision over the affairs of the<br />
<strong>Association</strong> and shall have the responsibility for all activities of the <strong>Association</strong>.<br />
The Steering Committee shall conduct the business of the <strong>Association</strong> in the interim<br />
between annual conferences of the <strong>Association</strong> by such means of communication as<br />
deemed appropriate by the President or Chairman.<br />
E. Meeting of the Steering Committee shall be held during the annual<br />
conferences of the <strong>Association</strong> and at such times as requested by the President of the<br />
<strong>Association</strong> or the Chairman of the Steering Committee. Representation from the<br />
majority of the organizations of the Steering Committee shall constitute a quorum.<br />
Article VI - Officers<br />
1 consist of a President, Chairman of the<br />
A. The officers of the <strong>Association</strong> shal<br />
Steering Committee and a Secretary.<br />
569
B. The President of the <strong>Association</strong> shal1 be the Commanding Officer of the<br />
armed services agency coordinating the annual conference of the <strong>Association</strong>. The<br />
term of the President shall begin at the close of the annual conference of the<br />
<strong>Association</strong> and shall expire at the close of the next annual conference.<br />
C. It shall be the duty of the President to organize and coordinate the annual<br />
conference of the <strong>Association</strong> held during his term of office, and to perform the<br />
customary duties of a president.<br />
D. The Secretary of the <strong>Association</strong> shall be filled through appointment by the<br />
President of the <strong>Association</strong>. The term of office of the Secretary shall be the same as<br />
that of the President.<br />
E. It shall be the duty of the Secretary of the <strong>Association</strong> to keep the records of<br />
the association, and the Steering Committee, and to conduct official correspondence<br />
of the association, and to insure notices for conferences. The Secretary shall solicit<br />
nominations for the Harry Greer award prior to the annual conference. The<br />
Secretarv shall also perform such additional duties and take such additional<br />
responsibilities as the President may delegate to him.<br />
Article 1’11 - Rleetings<br />
A. The <strong>Association</strong> shall hold a conference annually.<br />
B. The annual conference of the <strong>Association</strong> shall be coordinated by the agencies<br />
of the associated armed services exercising primary responsibility for military<br />
personnel assessment. The coordinating agencies and the order of rotation will be<br />
determined annually by the Steering Committee. The coordinating agencies for at<br />
least the following three years will be announced at the annual meeting.<br />
C. The annual conference of the <strong>Association</strong> shall be held at a time and place<br />
determined by the coordinating agency. The membership of the <strong>Association</strong> shall be<br />
informed at the annual conference of the place at which the following annual<br />
conference will be held. The coordinating agency shall inform the Steering<br />
Committee of the time of the annual conference not less than six (6) months prior to<br />
the conference.<br />
D. The coordinating agency shall exercise planning and supervision over the<br />
program of the annual conference. Final selection of program content shall be the<br />
responsibility of the coordinating organization.<br />
E. Any other organization desiring to coordinate the conference may submit a<br />
formal request to the Chairman of the Steering Committee, no later than 18 months<br />
prior to the date they wish to serve as host.<br />
Article VIII - Committee<br />
A. Standing committees may be named from time to time, as required, by vote of<br />
the Steering Committee. The chairman of each standing committee shall be<br />
appointed by the Chairman of the Steering Committee. Members of standing<br />
committees shall be appointed by the Chairman of the Steering Committee in<br />
consultation with the Chairman of the committee in question. Chairmen and<br />
570
committee members shall serve in their appointed capacities at the discretion of the<br />
Chairman of the Steering Committee. The Chairman of the Steering Committee<br />
shall be ex officio member of all standing committees.<br />
B. The President, with the counsel and approval of the. Steering Committee, may<br />
appoint such ad hoc committees asare needed from time to time. An ad hoc<br />
committee shall serve until its assigned task is completed or for the length of time<br />
specified by the President in consultation with the Steering Committee.<br />
C. All standing committees shall clear their general plans of action and new<br />
policies through the Steering Committee, and no committee or committee chairman<br />
shall enter into relationships or activities with persons or groups outside of the<br />
<strong>Association</strong> that extend beyond the approved general plan of work without the<br />
specific authorization of the Steering Committee.<br />
D. In the interest of continuity, if any officer or member has any duty elected or<br />
appointed placed on him, and is unable to perform the designated duty, he should<br />
decline and notify at once the officers of the <strong>Association</strong> that he cannot accept or<br />
continue said duty.<br />
Article 1X - Amendments<br />
A. Amendments of these By-Laws may be made at any annual conference of the<br />
<strong>Association</strong>.<br />
B. Amendments of the By-Laws may be made by majority vote of the assembled<br />
membership of the <strong>Association</strong> provided that the proposed amendments shall have<br />
been approved by a majority vote of the Steering Committee.<br />
C. Proposed amendments not approved by a majority vote of the Steering<br />
Committee shall require a two-thirds vote of the assembled membership of the<br />
<strong>Association</strong>.<br />
Article X - Voting<br />
All members in attendance shall be voting members.<br />
A. Selection Procedures:<br />
Article XI - Harry H. Greer Award<br />
1. Recipients of the Harry H. Greer Award will be selected by a committee<br />
drawn from the agencies represented on the MTA Steering Committee. The CO of<br />
each agency will designate one person from that agency to serve on the Awards<br />
Committee. Each committee member will have attended at least three previous<br />
MTA meetings. The member from the coordinating agency will serve as chairman of<br />
the committee.<br />
2. Nominations for the award in a given year will be submitted in writing tc\<br />
the Awards Committee Chairman by 1 July of that year.<br />
571
3. The Chairman of the committee is responsible for canvassing the other<br />
committee members to arrive at consensus on the selection of a recipient of the<br />
award.<br />
4. No more than one person is to receive the award each year, but the award<br />
need not be made each year. The Awards Committee may decide not to select a<br />
recipient in any given year.<br />
5. The annual selection of the person to receive the award, or the decision not<br />
to make an award that year, is to be made at least six weeks prior to the date of the<br />
annual MTA Conference.<br />
B. Selection Criteria:<br />
The recipients of the Harry H. Greer Award are to be selected on the basis of<br />
outstanding work contributing significantly to the MTA.<br />
C. The Award:<br />
The Harry H. Greer Award is to be a certificate normally presented to the<br />
recipient during the Annual MTA Conference. The awards committee is responsible<br />
for preparing the text of the certificate. The coordinating agency is responsible for<br />
printing and awarding the certificate.<br />
Article XII - Enactment<br />
These By-Laws shall be in force immediately upon acceptance by a majority of the<br />
assembled membership of the <strong>Association</strong> andor amended (in force 5 November<br />
1990).<br />
572
CDR Mary Adams<br />
Naval Education and Training Program<br />
Management Support Activity (Code 03)<br />
Pensacola, FL 32509-5000<br />
USA<br />
A/V 922-1685 COM: (904) 452-1685<br />
Walter G. Albert<br />
Air Force Human Resources Laboratory/MOD<br />
Brooks AFB, TX 78235-5601<br />
USA<br />
COM: (512) 240-3677<br />
;<br />
Dr. Cathie E. Alderks<br />
Army Research Institute<br />
5001 Eisenhower Avenue<br />
Alexandria, VA 22333-5600<br />
USA<br />
A/V 284-8293 COM: (703) 274-8293<br />
LTCOL Drs Pleter S. Andriesse<br />
Ministry of Defence<br />
Directorate of RNLAF/Personnel<br />
P.O. Box 20703<br />
2500 ES the Dhague<br />
The Netherlands<br />
Jane M. Arabian, Ph.D.<br />
Commander, U.S. Army Research Institute<br />
Attn: PERI-RR)<br />
Alexandria, VA 22003-5600<br />
USA<br />
A/V 284-8275 COM: (703) 274-8275<br />
Klaus Arndt<br />
German Federal Armed Forces Admin Office<br />
Bonner Talweg 177<br />
D-5300 Bonn<br />
Federal Republic of Germany<br />
PH: (Germany) 228-122076<br />
MAJ Robert L. Ashworth, Jr,<br />
U.S. Army Research Institute<br />
Boise Element<br />
1910 University Drive<br />
Boise, ID 83725-1140<br />
USA<br />
COM: (208) 334-9390<br />
LIST OF<br />
CONFERENCE REGISTRANTS<br />
573<br />
Annette G. Baisden<br />
Naval Aerospace Medical Inst. (Code 4<br />
Naval Air Station<br />
Pensacola, FL 32508-5600'<br />
USA<br />
A/V 922-2516 COM: (904) 452-2516<br />
Louis E. Banderet, Ph.D.<br />
U.S. Army Institute of Environmental<br />
Medicine<br />
Health and Performance Division<br />
Natick, MA 01760-5007<br />
USA<br />
A/V 256-4858 COM: (508) 651-4858<br />
Dr. David W. Bessemer<br />
US Army Research Institute<br />
Field Unit - Ft. Knox<br />
Ft. Knox, KY 40121-5620<br />
USA<br />
A/V 464-4932 COM: (5@2) 624-4932<br />
LTCOL John Birkbeck<br />
MOD A Ed 4<br />
Court Road<br />
Eltham<br />
London SE9 5HR<br />
United Kingdom<br />
Dr, Walter C. Borman<br />
Department of Psychology<br />
University of South Florida<br />
Tampa, FL 33620-8200<br />
USA<br />
Michael J. Bosshardt<br />
Personnel Decisions Research Institute<br />
43 Main St., S.E.<br />
Suite #SO5<br />
Minneapolis, MN 55414<br />
USA<br />
COM: (612) 331-3680<br />
CAPT J. Peter Bradley<br />
Canadian Forces<br />
Personnel Applied Research Unit<br />
4900 Yonge St., Suite 600<br />
Willowdale, Ontario, M2N 697<br />
Canada<br />
A/V 827-4239 CCM: i416) 224-4972<br />
12)
Dr. Elizabeth J. Brady<br />
U.S. Army Research Institute<br />
Attn: PERI-RS, 5001 Eisenhower Avenue<br />
Alexandria, VA 22333-5600<br />
USA<br />
A/V 284-0215 COM: (703) 274-8275<br />
David E. Brown, Jr.<br />
Metrica, Inc.<br />
8301 Broadway, Suite 215<br />
San Antonio, TX 78209<br />
USA<br />
LTCOL David E. Brown, Sr.<br />
AFHRLIMOM<br />
Brooks AFB, TX 78235-5601<br />
USA<br />
A/V 240-3942 COM: (512) 536-3942<br />
Lawrence S. Buck<br />
Planning Research Corporation<br />
1440 Air Rail Avenue<br />
Virginia Beach, VA 23455<br />
USA<br />
COM: (804) 460-2276<br />
Dr, Lloyd D. Burtch<br />
Air Force Human Resources LaboratorylPR<br />
Brooks AFB, TX 78235-5601<br />
USA<br />
A/V 240-3011 COM: (512) 536-3611<br />
Dr. Henry H. Busciglio<br />
U.S. Army Research Institute<br />
Attn: PERI-RS, 5001 Eisenhower Avenue<br />
Alexandria, VA 22333-5600<br />
USA<br />
A/V 284-8275 COM: (703) 274-8275<br />
Charlotte H. Campbell<br />
Human Resources Research Organization<br />
295 W. Lincoln Trail Boulevard<br />
Radcliff, KY 40160<br />
USA<br />
J. R. Dick Campbell<br />
Air Traffic Services Transport Canada<br />
1574 Champneuf Dr.<br />
Orleans, Ontario KlC 6B5<br />
Canada<br />
COM: W(613) 998-6617 H(613) 837-0440<br />
574<br />
Roy C. Campbell<br />
Human Resources Research Organization<br />
295 W. Lincoln Trail Boulevard<br />
Radcliff, KY 40160-2042<br />
USA<br />
CAPT William J. Carle<br />
6435 Crestway Dr., #174<br />
San Antonio, TX 78239<br />
USA<br />
A/V 487-3694 COM: (512) 652-3694<br />
Norman A. Champagne<br />
Naval Education and Training Program<br />
Management Support Activity (Code 03172)<br />
Pensacola, FL 32509-5000<br />
USA<br />
A/V 922-1355 COM: (904) 452-1355<br />
Dr. Herbert J. (Jim! Clark<br />
3410 Prince George<br />
San Antonio, TX 78233<br />
USA<br />
A/V 240-3169 COM: (512) 536-3611<br />
Harry A. Clark III<br />
8265 Campobello<br />
San Antonio, TX 78218<br />
USA<br />
A/V 487-5234 COM: (512) 652-5234<br />
Dennis D. Collins<br />
HQDA, DAPE-MR, Rm. 2C733<br />
The Pentagon<br />
Washington, DC 20310-0300<br />
USA<br />
A/V 225-9213 COM: (202) 695-9213<br />
Dr. Harry B. Conner<br />
Navy Research and Development Center<br />
Code 142<br />
San Diego, CA 92123-6800<br />
USA<br />
A/V 553-6675 COM: (619) 553-6675<br />
MAJ Anthony J. Cotton<br />
1 Psych Research Unit<br />
P.O. Box E33<br />
Queen Victoria Tee<br />
Barton Act 2600<br />
Australia
Jack R. Dempsey<br />
Human Resources Research Organization<br />
1100 South Washington Street<br />
Alexandria, VA 22314<br />
USA<br />
COM: (703) 549-3611<br />
Dr. Grover E. Diehl<br />
Eval. & Research Branch<br />
USAF Extension Course Inst.<br />
Gunter AFB, AL 36118-5643<br />
USA<br />
A/V 446-3641 COM: (205) 279-3641<br />
CAPT Joseph M. Donnelly<br />
46 Walcheren Loop<br />
Borden, Ontario LOM ICO<br />
Canada<br />
A/V 270-3917 COM: (705) 423-3917<br />
David A. DuBois<br />
Personnel Decisions Research Institute<br />
43 Main St., S.E,<br />
Suite #405, Riverplace<br />
Minneapolis, MN 55414<br />
USA<br />
COM: (612) 331-3680<br />
MAJ R. Eric Duncan<br />
10109 Trapper's Ridge<br />
Converse, TX 78109<br />
USA<br />
Dale R. Eckard<br />
Naval Education and Training Program<br />
Management Support Activity (Code 3163)<br />
Pensacola, FL 32509-5000<br />
USA<br />
A/V 922-1792 COM: (904) 452-1792<br />
Jack E, Edwards<br />
Navy Personnel R&D Center (Code 121)<br />
San Diego, CA 92111-6800<br />
USA<br />
A/V 553-7630 COM: (619) 553-7630<br />
. - - ~--. --- ----<br />
Dr. Timothy W. Elis<br />
U.S. Army Research-Institute (PERI-RG)<br />
5001 Eisenhower Avenue<br />
Alexandria, VA 22333-5600<br />
USA<br />
A/V 354-5786 COM: (703) 274-5610<br />
MAJ Philip J. Exner<br />
Manpower Analysis, Eva1 & Coordination<br />
Headquarters, U.S. Marine Corps<br />
Washington, DC 20380-0001<br />
USA<br />
A/V 224-4165 COM: (703) 614-4165<br />
Frank Fehler<br />
Flugplatz<br />
3062 Bueckeburg<br />
Federal Republic of Germany<br />
PH: (Germany) 05722-4001, Ext 307<br />
Dr. Daniel B.. Felker<br />
3333 K Street, NW<br />
Washington, DC 20007<br />
USA<br />
COM: (202) 342-5000<br />
Dr. Fred E. Fiedler<br />
Department of Psychology<br />
University of Washington<br />
Seattle, WA 98195<br />
USA<br />
A/V 88-473-2032 COM: (512)671-2032<br />
Dorothy L. Finley<br />
Army Research Institute<br />
Bldg. 41203 Attn: PERI-IG (Finley)<br />
Ft. Gordon, GA 30905-5230<br />
USA<br />
A/V 780-5523 COM: (404) 791-5523<br />
CAPT David C. Fischer<br />
HQ AFOTEC/OAH2<br />
Kirtland AFB, NM 87117-7001<br />
USA<br />
A/V 244-4201 COM: (505) 846-4201<br />
Dr. Jqhn C. Eggenberger Dr. Max H. Flach<br />
Director, Pers Applied Research & Tng Bundesminlster der Verteidigung<br />
SNC Defence Products Ltd. -FuSI8-<br />
Heritage Place, 155 Queens Street, #132 Postfach 1328<br />
Ottawa, Ontario KlP 6Ll D- 5300 Bonn 1<br />
Canada Federal Republic of Germany<br />
COM: (613) 238-7216<br />
575
COL James C. Fleming<br />
National Defence Headquarters<br />
101 Colonel By Drive<br />
Ottawa, Ontario KIA OK2<br />
Canada<br />
A/V 642-3507 COM: (613) 992-3507<br />
Mr. John W,K. Fug111<br />
49 Dalton Rd.<br />
St. Ives, NSW 2075<br />
Australia<br />
(02) 4009243<br />
LTCOL Frank C. Gentner<br />
ASDIALHA<br />
MPT Analysis & Info System Division<br />
Wright-Patterson AFB, OH 45431<br />
USA<br />
Alice Gerb<br />
Director, <strong>Military</strong> Programs Office<br />
Educational <strong>Testing</strong> Service<br />
Rosedale Road<br />
Princeton, NJ 08541<br />
USA<br />
COM: (609) 921-9600<br />
Constance A. Glllan<br />
550 West Pennsylvania Avenue lt8<br />
San Diego, CA 92103<br />
USA<br />
A/V 735-7195 COM: (619) 545-7195<br />
Chrlsta A. Grier<br />
Naval Education and Training Program<br />
Management Support Activity (Code 3124)<br />
Pensacola, FL 32509-5000<br />
USA<br />
A/V 922-1765 COM: (904) 452-1765<br />
Wulf Gronwald<br />
D23<br />
Infanteriestr. 17<br />
8000 Munchen 40<br />
Federal Republic of Germany<br />
(089) 3069-2417<br />
2LT Jody A. Guthals<br />
Air Force Human Resources Lab (MOD)<br />
MPT Technology Branch<br />
Brooks APB, TX 78235-5601<br />
USA<br />
A/V 240-3677 COM: (512) 536-3677<br />
576<br />
Dr. Michael W. Habon<br />
Post Fach 1420<br />
Dornier GmbH, Dept. E&WI<br />
D-7990 Friedrichshafen<br />
Federal Republic of Germany<br />
MAJ Martin P. Hankes-Drielsma<br />
National Defense Headquarters<br />
Ottawa, Ontario KlS 3A8<br />
Canada<br />
Dr. Dieter H.D. Hansen<br />
MOD (Armed Forces Staff)<br />
Postbox 1328<br />
D-5300 Bonn 1<br />
Federal Republic of Germany<br />
Telefax (0228) 12 9059<br />
Mary Ann Hanson<br />
Personnel Decisions Research Institute<br />
43 Main St., SE<br />
Suite #405<br />
Minneapolis, MN 55414<br />
USA<br />
COM: (612) 331-3680<br />
CAPT Johnnie C. Harris<br />
USAF Occupational Measurement Center<br />
Attn: OMVD<br />
Randolph AFB, TX 78150-5000<br />
USA<br />
.Mary Ellen Hartmann<br />
Questar Data Systems, Ince<br />
2905 West Service Road<br />
Eaqan, MN 55121-2199<br />
USA<br />
CDR Robert B. Hawkins<br />
Chief of Naval Education and Training<br />
Code N31T, Naval Air Station<br />
Pensacola, FL 32508-5100<br />
USA<br />
Dr. Charles W. .Hesse<br />
Naval Aviation and Training Program<br />
Management Support Activity (Code 313)<br />
Pensacola, FL 32509-5000<br />
USA<br />
A/V 922-1579 COM: (904) 452-1579
c _.. -.<br />
_ ..-___<br />
Mr. G. J. (Jeff) Higgs<br />
National Defence Headquarters<br />
101 Colonel By Drive (Attn: DMOS 3)<br />
Ottawa, Ontario KlA OX2<br />
Canada<br />
A/V 842-7069 COM: (613) 992-7069<br />
CAPT D. Wayne Hlntze<br />
Manpower Analysis, Eva1 & Coordination<br />
HO, U.S. Marine Corps (Code MA)<br />
Washington, DC 20380-0001<br />
USA<br />
A/V 224-4165 COM: (703) 614-4165<br />
Mr. Charles R. Hoshaw<br />
5920 Brookview Drive<br />
Alexandria, VA 22310<br />
USA<br />
COM: (703) 694-5511<br />
Janis S. Houston<br />
Personnel Decisions Research Institute<br />
43 Main St., S.E.<br />
Riverplace Suite #405<br />
Minneapolis, MN 55414<br />
USA<br />
COM: (612) 331-3680<br />
Dr. DeLayne R. Hudspeth<br />
College of Education (EDB406)<br />
The University of Texas<br />
Austin, TX 78712<br />
USA<br />
COM: (512) 471-5211<br />
Dr. Alain Hunter<br />
Technical Director<br />
NMPC DET NODAC<br />
Bldg. 150 WNY (Anacostia)<br />
Washington, DC 20374-1501<br />
USA<br />
A/V 288-4620 COM: (202) 433-4620<br />
Barbara A. Jezior<br />
U.S. Army Natick R,D,E Ctr-STRNC-YB<br />
Kansas Street<br />
Natick, MA 01760-5020<br />
USA<br />
A/V 256-5523 COM: (508) 651-5523<br />
.f<br />
577<br />
Wayne E. Keates<br />
Personnel Applied Research & Training<br />
Division - SNC Defense Products, Ltd.<br />
155 Queen St,, Suite 1302<br />
Ottawa, Ontario KlPCLl<br />
Canada<br />
COM: (613) 238-7216<br />
Dr. Robert S. Kennedy<br />
Vice President, Essex Corporation<br />
1040 Woodcock Road, #227<br />
Orlando, FL 32803<br />
USA<br />
COM: (407) 894-5090<br />
CDR Robert H. Kerr<br />
Canadian Forces Fleet School<br />
FM0 Halifax<br />
Nova Scotia B3K 2X0<br />
Canada<br />
A/V 447-8054 COM: ;(902) 427-8054<br />
Rex G. Kinder<br />
Rexton Consulting Services, Pty Ltd.<br />
P.O. Box 382, Manly<br />
NSW 2095<br />
Australia<br />
Robert W. King<br />
4055 Bedevere Dr.<br />
Pensacola, FL 32514<br />
USA<br />
A/V 922-1663 COM: (904) 452-1663<br />
Thomas P. Kirchenkamp<br />
Dornier GmbH<br />
P.O. Box 1420. Dept. WTWI<br />
D-7990 Friedrichshafen 1<br />
Federal Republic of Germany<br />
49-7545-5775<br />
Dr., Paul Klein<br />
Sozialwissenschaftliches Inst.<br />
Der Bundeswehr, Winzererstr.<br />
52D - 8000 Munchen 40<br />
089 12003233<br />
Wolf Knacke<br />
Streitkrafteamt (Armed Forces Office)<br />
- I 7 / Militarpsychologie -<br />
Postfach 20 50 03<br />
D- 5300 Bonn - 2<br />
Federal Republic Of GErmanY
Dr. John L. Kobrick<br />
US Army Research Institute<br />
of Environmental Medicine<br />
Kansas Street<br />
Natick, MA 01760<br />
USA<br />
A/V 256-4885 COM: (508) 651-4885<br />
Fay J. Landrum<br />
Naval Aviation and Training Program<br />
Managment Support Activity (Code 3161)<br />
Pensacola, FL 32509-5000<br />
USA<br />
A/V 922-1736 COM: (904) 452-1736<br />
Richard S. Lanterman<br />
U.S. Coast Guard HQ (G-PWP-2), Room 4111<br />
2100 Second St,, S.W.<br />
Washington, DC 20593-0001<br />
USA<br />
COM: (202) 267-2986<br />
Dr. James M. Lentz<br />
Naval Education and Training Program<br />
Management Support Activity (Code 301)<br />
Pensacola, FL 32509-5000<br />
USA<br />
A/V 922-1685 COM: (904) 452-1685<br />
Dr. Carl W. Lickteig<br />
U.S. Army Research Institute<br />
Field Unit Ft Knox (Attn: PERI-IK)<br />
2423 Morande Street<br />
Fort Knox, Ky 40121-5620<br />
USA<br />
A/V 464-7046 COM: (502) 624-7046<br />
COL Michael Lindquist<br />
<strong>Military</strong> Education Divlson J-7<br />
Pentagon, Room lA724<br />
Washington, DC 20318-7000<br />
USA<br />
Dr. Suzanne Lipscomb<br />
AFHRLIPRP<br />
Brooks AFB, TX 78235-5601<br />
USA<br />
Richard M. Lopez<br />
Naval Aviation and Training Program<br />
Management Support Activity (Code 3171)<br />
Pensacola, FL 32509-5000<br />
USA<br />
A/V 922-1357 COM: (904) 452-1357<br />
578<br />
Donald F. Lupone<br />
Naval Education and Training Program<br />
Management Support Activity (Code 315)<br />
Pensacola, FL 32509-5000<br />
USA<br />
A/V 922-1777 COM: (904) 452-1777<br />
Dr. Fred A. Mae1<br />
U.S. Army Research Institute<br />
5001 Eisenhower Avenue<br />
Alexandria, VA 22333-5600<br />
USA<br />
A/V 284-8275 COM: (202) 274-8275<br />
Dr. Rolland R. Mallette, Major (Retj<br />
Industrial Psychologist<br />
Ontario Hydro-700 University Ave (H3-G27<br />
Toronto, Ontario M5G 1X6<br />
Canada<br />
COM: (416) 592-7038<br />
LTC Ken A. Martell<br />
6308 Falling Brook Drive<br />
Burke, VA 22015<br />
USA<br />
A/V 225-456012225 COM: 202-695-4560<br />
Ms. Nora E. Matos<br />
Naval Education and Training Program<br />
Management Support Activity (Code 312)<br />
Pensacola, FL 32509-5000<br />
USA<br />
A/V 922-1388 COM: (904) 452-1388<br />
Dr. James R. McBride<br />
6430 Elmhurst Dr.<br />
San Diego, CA 92120<br />
USA<br />
COM: (619) 582-0200<br />
Dean C. McCallum<br />
Naval Education and Training Program<br />
Management Support Activity (Code 313)<br />
Pensacola, FL 32509-5000<br />
USA<br />
A/V 922-1648 COM: (904) 452-1648<br />
Donald E. McCauley, *Jr.<br />
Office of Rsch & Development<br />
Room 6451<br />
Office of Personnel Management<br />
Washington, DC 20415<br />
USA<br />
COM: (202) 606-0880
Deborah L. McCormick<br />
Chief,of Naval Technical Training<br />
Attn: N6211<br />
NAS Memphis<br />
Millington, TN 38054-5056<br />
USA<br />
A/V 966-5865 COM: (901) 873-5865<br />
Harold M. McCurry<br />
1919 Baldwin Brook Dr.<br />
Montgomery, AL 36116<br />
USA<br />
COM: (205) 279-5382<br />
Edward McFadden<br />
Atlanta <strong>Military</strong> Entrance Proce&ing St<br />
M.L. King Federal Annex; Ground Floor<br />
77 Forsyth Street SW<br />
Atlanta, GA 30303-3427<br />
USA<br />
Dr. Albert H. Melter<br />
Personalstammamt der Bundeswehr<br />
Koelner Str. 262<br />
D-5000 Koeln 90<br />
Federal Republic of Germany<br />
Central Personnel Office<br />
German Federal Armed Forces<br />
PH: (Germany) (02203) 12021 472<br />
MAJ Harold C. Mendes<br />
520 Larochelle<br />
Saint Jean-<br />
Quebec J3B lJ5<br />
Canada<br />
LT Mark R. Miller<br />
12474 Starcrest #210<br />
San Antonio, TX 78216<br />
USA<br />
A/V 240-3222 COM: (512) 536-3222<br />
William M. Minter<br />
.ECI/EDC<br />
U.S. Air Force<br />
Gunter AFB, AL 36118<br />
USA.<br />
A/V 446-4151 COM: (205) 279-4151<br />
579<br />
Dr. Angelo Mirabella<br />
U.S. Army Research Institute<br />
5001 Eisenhower Avenue<br />
Alexandria, VA 22333-5600<br />
USA<br />
A/V 284-8827 COM: (202) 274-8827<br />
Dr. Jimmy L. Mitchell<br />
McDonnel Douglas Missile Systems Co,<br />
8301 North Broadway, Suite 211<br />
San Antonio, TX 78109<br />
USA<br />
COM: (512) 826-8664<br />
William E. Montague<br />
Training Technology<br />
Navy Personnel R&D Center (Code 15A)<br />
San Diego, CA 92152-6800<br />
USA<br />
A/V 553-7849 COM: (619) 553-7849<br />
LCDR Tom Morrison<br />
Naval Aerospace Medical Institute<br />
Code 412<br />
Naval Air Station<br />
Pensacola, FL 32508-5600<br />
USA<br />
A/V 922-2615 COM: (904) 452-2615<br />
Dr. C. Jill Mullins<br />
Chief of Naval Education 6 Training<br />
N-11, Bldg. 628<br />
Naval Air Station<br />
Pensacola, FL 32538-5100<br />
USA<br />
A/V 922-4207 CC’M: (904) 452-4207<br />
James Gerald Murphy<br />
Naval Education and Training Program<br />
Management Support Activity (Code 03171<br />
Pensacola, FL 32509-5000<br />
USA<br />
A/V 922-1414 COM: (904) 452-1414<br />
CAPT Edward L. Naro<br />
Naval <strong>Military</strong> Personnel Command<br />
Navy Occupational Dev. d Analysis Center<br />
Bldg. 150, WNY, (Ai?,acQstia)<br />
Washington, DC 20374 -1501<br />
USA<br />
A/V 288-5488 COM: (202) 433-5488
Joe H. Neidig<br />
Naval Education and Training Program<br />
Management Support Activity (Code 3111)<br />
Pensacola, FL 32509-5000<br />
USA<br />
A/V 922-1729 COM: (904) 452-1729<br />
Ms. Mary L. Norwood<br />
U.C. Coast Guard HQ (G-PWP-21, Room 4111<br />
2100 Second St., SW<br />
Washington, DC 20593-0001<br />
USA<br />
Dr. Lawrence H. O'Brien<br />
Dynamics Research Corporation<br />
60 Concord Street<br />
Wilmington, MA 02174<br />
USA<br />
COM: (508) 658-6100<br />
Brian S. O'Leary<br />
US Office of Personnel Management<br />
Room 6451<br />
1900 E Street, N, W.<br />
Washington, DC 20415<br />
USA<br />
COM: (202) 606-0880<br />
Robert C. Pallme<br />
Naval Education and Training Program<br />
Management Support Activity (Code 311)<br />
Pensacola, FL 32509-5000<br />
USA<br />
A/V 922-1728 COM: (904) 452-1728<br />
Dr. Dale R. Palmer<br />
U.S. Army Research Institute<br />
5001 Eisenhower Avenue<br />
Alexandria, VA 22333-5600<br />
USA<br />
A/V 284-8275 COM: (703) 284-8275<br />
Stephen W<br />
NPRDC<br />
Code 15<br />
San Diego<br />
USA<br />
A/V 553-77<br />
Parchman<br />
CA 92152-6800<br />
94 COM: (619) 553-7794<br />
580<br />
Randolph Park<br />
American Institutes for Research<br />
3333 K St., NW<br />
Washington, DC<br />
USA<br />
COM: (202) 342-5000<br />
Dr. John J. Pass<br />
927 Nautilus Isle<br />
Dania, FL 33004<br />
USA<br />
Robert H. Pennington<br />
Naval Education and Training Program<br />
Management Support Activity (Code 314)<br />
Pensacola, FL 32509-5000<br />
USA<br />
A/V 922-1547 COM: (904) 452-1547<br />
Carlene M. Perry<br />
United States Air Forces Academy<br />
P.0, Box 4269<br />
US Air Force Academy, CO 80841<br />
USA<br />
COM: (719) 472-4551<br />
Dr. Mark G. Pfeiffer<br />
NAVTRASYSCEN Training Analysis & Evaluat<br />
Code 121<br />
12350 Research Parkway<br />
Orlando, FL 32826-3224<br />
USA<br />
A/V 960-4132 COM: (407) 380-4132<br />
William J. Phalen<br />
AFHRL/MOD<br />
Brooks AFB. TX 78235-5600<br />
USA<br />
A/V 240-3677 COM: (512) 536-3677<br />
Squadron Leader John S. Price<br />
Royal Australian Air Force<br />
U.S. Air Force Human Resources<br />
Laboratory (AFHRL/MOD)<br />
Brooks AFB, TX 78235-5000<br />
USA<br />
A/V 240-3648 COM: (512) 536-3648
COL Terry J. Procluk<br />
Director of Personnel Psychology and<br />
sociology<br />
National Defense Headquarters<br />
Ottawa, Ontario KlA OK2<br />
Canada<br />
A/V 042-0244 COM: (613) 992-0244<br />
Dr. Wlebke Putz-Osterloh<br />
Lehrstuhl Psychologle<br />
Universitate Bayreuth<br />
Postfach 10151., D 8580 Bayreuth<br />
Federal Republic of Germany<br />
University of Bayreuth<br />
(0921) 55700<br />
Martin L. Rauch<br />
Federal Ministry of Defense, P II 4<br />
Postfach 1328<br />
5300 Bonn 1<br />
Federal Republic of Germany<br />
COM: 49-228-128543<br />
LT Daniel T. Reeves<br />
Canadian Forces<br />
Personnel Applied Research Unit<br />
4900 Yonge St., Suite 600<br />
Willowdale, Ontario, M2N 6B7<br />
Canada<br />
A/V 027-4239 COM: (416) 224-4968<br />
Beatrice Julie Rheinstein<br />
Office of Personnel Management<br />
1900 E Street, Room 6451<br />
Washington, DC 20415-5000<br />
USA<br />
COM: (202) 606-2694<br />
William M. Ritchie<br />
National Defense Headquarters<br />
101 Colonel By Drive<br />
Ottawa, Ontario KlA OK2<br />
Canada<br />
Dr. Gwyn Robson<br />
Marine Corps Institute<br />
P. 0. Box 1775<br />
Arlington, VA 22222-0001<br />
USA<br />
A/V 288-4109 COM: (202) 433-4109<br />
581<br />
Dr. Gerd Rode1<br />
Freiwllligenannahmezentrale der Marine<br />
Ebkeriege 35191<br />
D-2940 Wilhelmshaven<br />
Federal Republic of Germany *<br />
COM: (04421) 792124<br />
Earl F. Roe<br />
Naval Education and Training Program<br />
Management Support Activity (Code 0317)<br />
Pensacola, Fl 32509-5000<br />
USA<br />
A/V 922-1335 COM: (904) 452-1335<br />
CIC Diane L. Romaglia<br />
United States Air Force Academy<br />
P.O. Box 4405<br />
U.S. Air Force Academy, CO 80841<br />
USA<br />
A/V 259-4537 COM: (719) 472-4533<br />
Kendall L. Roose<br />
Training Department<br />
Training Air Wing Five<br />
NAS Whiting Field<br />
Milton, FL 32570-5100<br />
USA<br />
A/V 868-7266 COM: (904) 623-7266<br />
COL Dr. Ger J.C. Roozendaal<br />
DPKL/afd GW<br />
Postbus 90701<br />
2509 LS The Haque<br />
The Netherlands<br />
COM: 31-71-6135450<br />
Sandra A. Rudolph<br />
Chief of Naval Technical Training<br />
Bldg C-l, Code N632<br />
NAS Memphis<br />
Millington, TN 38054-5056<br />
USA<br />
A/V 966-5591 COM: (901) 873-5591<br />
Roberto B. Salinas<br />
USAFOMSQ/OMYO<br />
Randolph AFB, TX 78150<br />
A/V 487-6811 COM: (512) 652-6811
MAJ Charles A, Salter<br />
Natick Research, Dev. h Eng. Center<br />
10 East Militia Hts.<br />
Needham, MA 02192<br />
USA<br />
A/V 256-4901 COM: (508) 651-4901<br />
Mr. William A. Sands<br />
Navy Personnel Research and<br />
Development Center (NPRDC)<br />
<strong>Testing</strong> Systems Department (Code 13)<br />
San Diego, CA 92152-6800<br />
USA<br />
A/V 553-9266 COM: (619) 553-9266<br />
Jerry Scarpate<br />
DEOMI/DR<br />
Patrick AFB, FL 32925-6685<br />
USA<br />
Sibylle B. Schambach<br />
c/o Federal Armed Forces Admin Office<br />
Bonner Talweg 177<br />
D-5300 Bonn<br />
Federal Republic of Germany<br />
PH: (Germany) 228-122099<br />
Dr. Amy C. Schwartz<br />
U.S. Army Research Institute<br />
5001 Eisenhower Avenue<br />
Alexandria, VA 22333<br />
USA<br />
A/V 284-0275 COM: (703) 274-0275<br />
LCDR James W. Shafovaloff<br />
Commandant (G-PWP-2)<br />
U.S. Coast Guard Headquarters<br />
2100 2nd St., S.W., Room 4111<br />
Washington, DC 20593-0001<br />
USA<br />
FTS 8-267-1954 COM: (202) 267-1954<br />
Dr. Joyce Shettel-Neuber<br />
NPRDC<br />
San Diego, CA 92152-6800<br />
USA<br />
A/V 553-7940<br />
Dr. Guy L. Siebold<br />
US Army Research Institute, (PERI-IL)<br />
5001 Eisenhower Avenue<br />
Alexandria, VA 22330-5600<br />
USA<br />
A/V 204-8293 COM: (703) 274-0293<br />
.<br />
582<br />
Brian W. D. Slack<br />
Ontario-Hydro<br />
P.O. Box 338<br />
Orangeville,<br />
Ontario L9W 227<br />
Canada<br />
COM: (519) 941-4620<br />
LT Wilfried A. Slowack<br />
Bruynstraat<br />
CRS Set Psy Ond<br />
Bruynstraat B-1120<br />
Brussels<br />
Belgium<br />
PH: (Belgium) 02-2680050, Ext. 3279<br />
Dr. Robert M. Smith<br />
Naval Aviation Schools Command<br />
Code 12, Bldg. 633, Room 137<br />
Naval Air Station<br />
Pensacola, FL 32500-5400<br />
USA<br />
A/V 922-4120 COM: (904) 452-4120<br />
Dr. J. Michaei Spector<br />
AFHRLfIDC<br />
Brooks AFB, TX 78235-5601<br />
USA<br />
A/V 240-3036 COM: !512j 536-3036<br />
Yvonne W. Squires<br />
10968 Portobelo Dr.<br />
San Diego, CA 92124<br />
USA<br />
A/V 553-8264 COM: (619) 553-0264<br />
Herb C. Stacy<br />
Chief of Naval Technical Training<br />
Bldg C-l, Code N5A<br />
NAS Memphis<br />
Millington, TN 38054-5056<br />
USA<br />
A/V 966-5984 COM: (901) 873-5984<br />
Michael R. Staley<br />
7521 126th Avenue<br />
Kirkland, WA 98033<br />
USA<br />
COM: (206) 869-5501
Paul P. Stanley.11<br />
USAFOMC/OMD<br />
Randolph AFB, TX<br />
18150-5000<br />
USA<br />
A/V 481-5234 COM: (512) 652-5234<br />
Dr. Alma G. Steinberg<br />
U.S. Army Research Institute<br />
Attn: PERI-IL<br />
5001 Eisenhower Avenue<br />
Alexandria, VA 22333-5600<br />
USA<br />
A/V 284-8293 COM: (103) 214-8293<br />
Stanley D. Stephenson<br />
Dept of Computer Information<br />
System & Admin Science<br />
Southwest Texas State Univ.<br />
San Marcos, TX 18666-4616<br />
USA<br />
COM: (512) 245-2291<br />
Dr. Lawrence J. Strlcker<br />
Educational <strong>Testing</strong> Service<br />
Princeton, NJ 08541-0001<br />
USA<br />
COM: (609) 134-5551<br />
J, S. Tartell<br />
USAF OMSQ/OMY<br />
Randolph AFB, TX 18150-5000<br />
USA<br />
DSN 481-6623 COM: (512) 652-6623<br />
John W. Thaln<br />
583 Cypress<br />
Monterey, CA 93940<br />
USA<br />
A/V 818-5164 COM: (408) 641-5764<br />
William J. Tharion<br />
Health & Performance Division<br />
USA Research Inst of Env Med<br />
Natick, MA 01160-5001<br />
USA<br />
A/V 256-4115 COM: (508) 651-4115<br />
Philip A. Thornton<br />
U.S. Coast Guard HQ (G-PWP-21, Room 4111<br />
2100 Second Street S.W.<br />
Washington, DC 20593-0001<br />
USA<br />
COM: (202) 261-1954<br />
583<br />
LCDR Barbara T. Transki<br />
Navy Occupational Dev d Analysis Center<br />
Bldg 150. WNY Anacostia<br />
Washington, DC 20374-1501<br />
USA<br />
A/V 288-4633 COM: (202) 433-4633<br />
Thomas Trent<br />
<strong>Testing</strong> System Dept., Code 132<br />
Navy Personnel R&D Center<br />
San Diego, CA 92152-6800<br />
USA<br />
A/V 553-7637 COM: (619) 553-1631<br />
Ms. Susan Truscott<br />
Dir of Social & Economic Analysis<br />
Operational Research/Analysis Estabiish<br />
101 Colby Drive<br />
Ottawa, Ontario KlA OK2<br />
Canada<br />
Dr. James W. Tweeddaie<br />
Chief of Naval Education and Training<br />
NROTC Division<br />
NAS Pensacola<br />
Pensacola, FL 32508<br />
USA<br />
A/V 922-4983 COM: (904) 452-4903<br />
Dr. Lloyd W. Wade<br />
Special Programs Department<br />
Marine Corps Institute<br />
Arlington, VA 22222-0001<br />
USA<br />
A/V 288-2612 COM: (202) 415-9229<br />
Dr. Raymond 0. Waldkoetter<br />
U.S. Army Soldier Spt Center<br />
Attn: ATSG-DDN iBldg 401)<br />
Fort Harrison, IN 46216-5700<br />
USA<br />
A/V 699-3819 COM: (311) 542-3879<br />
Aubrey E. Walker<br />
U.S. Army Infantry School<br />
Attn: ATSH-TDT-I<br />
Fort Bennlng, GA 31905-5593<br />
USA<br />
Clarence L. Walker<br />
Rt. 1, Box 593<br />
Purcellville, VA 22132<br />
USA<br />
COM: (103) 669-6427
Dr. Brian K. Waters<br />
HumRRO<br />
1100 South Washington Street<br />
Alexandria, VA 22314-4499<br />
USA<br />
COM: (703) 706-5647<br />
Johnny J. Weissmuller<br />
Metrica, Inc.<br />
8301 Broadway, Suite.215<br />
San Antonio, TX 78217<br />
USA<br />
COM: (512) 822-6600<br />
LTCOL Karol W,J. Wenek<br />
<strong>Military</strong> Leadership & Management Dept<br />
Royal <strong>Military</strong> College of Canada<br />
Kingston, Ontario KlK 5L0<br />
Canada<br />
COM: (613) 541-6304<br />
_-.-._<br />
James D. Wiggins<br />
Naval Education and Training Program<br />
Management Support Activity (Code 3104)<br />
Pensacola, FL 32509-5000<br />
USA<br />
A/V 922-1323 COM: (904) 452-1323<br />
584<br />
CDR Frederick F.P. Wilson<br />
Canadian Forces Personnel Applied<br />
Research Unit<br />
4900 Yonge St., Suite 600<br />
Willowdale, Ontario M2N224<br />
Canada<br />
COM: (416) 224-4964<br />
Dr. Lauress L. Wise<br />
DMDC<br />
99 Pacific St., #155A<br />
Monterey, CA 93940-2453<br />
USA<br />
COM: (408) 655-4000<br />
Dr. Martin F. Wiskcff<br />
307A Mar Vista Drive<br />
Montery, CA 93940<br />
USA<br />
COM: (408) 373-3073<br />
Darrell A. Worstine<br />
Commander, USAPIC<br />
Attn: ATNC-MO<br />
200 Stovall Street<br />
Alexandria, VA 22332-1330<br />
USA<br />
A/V 221-3250 COM: (703) 325-3250<br />
Timothy C. Zello<br />
U.S, Army Ordnance Center<br />
Attn: ATSL-MD<br />
Aberdeen Proving Ground, MD 21005-5201<br />
USA<br />
A/V 298-4115 COM: (301) 278-4115
Abrahams, N. M., 486<br />
Albert, W, G., 310, 316<br />
Alderks, C. E., 432<br />
Alley, F., 292<br />
Arabian, J. M., 226<br />
Arndt, K,, 104<br />
Ashworth, MAJ R. L., Jr., 199<br />
Baker, H., 292, 298, 304<br />
Banderet, L. E., 334, 339<br />
Bayes, A. H., 425<br />
Bennett, W. R., 116<br />
Bergquist, Maj T. M., 156 i<br />
Bessemer, D. W,, 150<br />
Borman, W. C., 268, 492, 498, 504<br />
Bosshardt, M., 504, 505, 516<br />
Bowler, E. C., 535<br />
Bradley, Capt. J. P., 262<br />
Brady, E. J., 322<br />
Brooks, J. T., 541<br />
Brown, G. C., 553<br />
Buck, L. S., 274<br />
Buckenmyer, D. V., 116<br />
Burch, R. L., 486<br />
Busciglio, Henry H., 380<br />
Campbell, C. H., 528, 541<br />
Campbell, R. C., 528, 529, 541<br />
Carle, W. J., 541<br />
Clark, H. J., 460<br />
Collins, D. D., 414<br />
Conner, Dr. H. B., 312<br />
Crafts, J. L., 535<br />
Crawford, K., 504, 516<br />
Crawford, R. L., PhD, 167<br />
Cymerman, A., 408<br />
Dart, 1Lt T. S., 156<br />
Dauphinee, SSG D. T., 339<br />
Dempsey, J. R., 25<br />
Dhammanungune, S., 304<br />
Dlehl, G. E., 128<br />
Dittmar, Me J., 316<br />
Doyle, E. L. ,529<br />
Dubois, D., 504, 505, 516<br />
Dunlap, W. P., 220<br />
Edwards, J. E., 31, 486<br />
Eggenberger, J; C., PhD, 167<br />
Elig, T. W., 19<br />
Ellis, J. A., 132<br />
Evans, R. M.? 191<br />
Exner, Maj P. J., 535<br />
Fayfich, P. R., 70<br />
Fehler, F., 180<br />
Felker. D. B. , 535<br />
- -<br />
INDEX OF AUTHORS<br />
585<br />
Fiedler, E., 392<br />
Finley, D. L., 94, 99<br />
Fowlkes. J. E., 220<br />
Goldberg, E. L., 474<br />
Greene, C. A., 241<br />
Guthals, 2Lt J. A., 76, 156<br />
Hand, D. K., 82, 316<br />
Hansen, H. D., 351<br />
Hanson, M. A., 268, 498<br />
Harris, D. A., 25<br />
Harris, J. C., 547<br />
Harris, J. H., 528<br />
Hawkins R. B<br />
Heslin 'Captain ;8lF 174<br />
Houston, J., 504,'52;'<br />
Hoyt, R., 408<br />
Hudspeth, Dr. D. R., 70<br />
Ince, V,, 241<br />
Jezior, B. A., 241<br />
Johnson, R. F., 210<br />
Jones, M. B., 419<br />
Jones, P. L., 122<br />
Kennedy. R. S., 220, 419<br />
Kittredge, R., 408<br />
Klein, P., 88<br />
Knight, J. R.. li6<br />
Kobrick, J. L., 210<br />
Koger, Major M. E., 174<br />
Laabs, G. J., 398<br />
Leaman, J. A., 455<br />
Lescreve, F., 216<br />
Lesher, L. L., 241<br />
Lester, L. S.. 280<br />
Lickteig, C. W., 174<br />
Lieberman, H. R.. 334<br />
Lindsay, T. J., 438<br />
Luisi, T. A., 280<br />
Luther, S. M., 280<br />
Mael, F. A., 286<br />
Marlowe, B. E., 408<br />
Martell, LTC K. A., 6<br />
Mayberry. P. W., 535<br />
McCauley, Jr., D. E., 51, 58, 64<br />
McCormick, D. L., 122<br />
McGee, S. D., 404<br />
McMenemy. D. J.. 210<br />
Melter, A. H., 35?<br />
Menchaca, Capt J.* .Jr., 76<br />
Mentges, W. 357<br />
Mirabella, A. .I62<br />
Mitchell,
Muraida, D. J. 185<br />
O’Brien, L. H. 251<br />
O’Leary, B. S. 51, 58, 64<br />
O’Mara, M. 339<br />
Olivier, L. 76<br />
Owens-Kurtz, C, K. 492<br />
Palmer, D. R. 328<br />
Parchman, S. W. 132<br />
Paullin, C. 498<br />
Perez, CPT P. J. 334<br />
Perry, C. M. 235<br />
Pfeiffer, G. 76<br />
Pfeiffer, M. G. 191<br />
Phalen, W. 3. 82, 310, 316<br />
Phelps, Dr. R. H. 199<br />
Pimental, N. A. 339<br />
Popper, R. 241<br />
Price, J. S., Squadron Leader 70<br />
Putz-Osterloh, W. 362<br />
Quenette, M. A. 398<br />
Reeves, Lt(N) D. T. 12<br />
Rheinstein, J. 51, 58, 64<br />
Riley, SGT R. H. 339<br />
Rodel, G. 368<br />
Romaglla, CIC D. L. 345<br />
Roozendaal, Col. G. J. C. 466<br />
Rosenfeld, P. 31<br />
Rudolph, S. A. 204<br />
Rumsey, M. G. 322<br />
Rushano, T. M. 386<br />
Russell, T. L. 492<br />
Salter, MAJ C. A. 280<br />
Sands, M. 298<br />
Sands, W. A. 245<br />
Schambach, S. B. 110<br />
Schwartz, A, C. 226, 256<br />
Sheposh, J. P. 474<br />
Sherman, F. 504, 522<br />
Shettel-Neuber, J. 474<br />
INDEX OF AUTHORS<br />
586<br />
Shukitt-Hale, B. L. 334<br />
Siebold, G. L. 438, 444<br />
Silva, J. M. 256<br />
Simpson, LTC R. L. 314<br />
Skinner, J. 345<br />
Slowack. W. 216<br />
Spector, .J. M. 185<br />
Spier, M. 304<br />
Spokane, A. 298<br />
Stanley II, P. P. 235, 547<br />
Steinberg, A. G. 455<br />
Stephenson, J. A. 138<br />
Stephenson, S. D. 136. 144<br />
Swirski, L. 292..:jOs<br />
Tartell, J. S. 547<br />
Thain, J. W. 231<br />
Tharion, W. J. 408<br />
Thomas, P. J. 31<br />
Toyota, SGT R. M. 339<br />
Trent, T. 398<br />
Truscott, 5. !<br />
Turnage, J. 1. 229, 4 1 3<br />
Tweeddale, J. W. 480<br />
Vandivier, P. L. 453<br />
Van Hemel, S. 252<br />
Vaughan, D. S. 116<br />
Waldkoetter, R. 0. 450<br />
Walker, C. L. 37<br />
Waters, B. K. 25<br />
White, L. A. 328<br />
White, W. R., Sr. 450<br />
Williams, J. E. 235<br />
Winn, LTC D. H. 6<br />
Wiskoff, M. 594, fC5, 51i, 522<br />
witt, SSG c. E . .!33<br />
York, W. .J., Jr, 94, 39<br />
Young, M. C. 328<br />
Zimmerman, R. A. 504, 5il<br />
_--._--.- __.__ -I_ -___ -__--.--. -1