Technical Report - International Military Testing Association

UNCLASSIFIED/UNLIMITED 

Technical 

Report 

distributed by 

Defense Technical Information Center ’ 

DEFENSE LOGISTICS AGENCY 

Cameron Station � Alexandria, Virginia 22314 

UNCLASSIFIED/UNLIMI 

0 

( q

, 

a 

8X 

I 

, . ‘0.: I. ! ‘, . !.‘.’ 

c L f ,, il ! !I 5 ti 0 u 5 t 

. :a , , , ‘, I, !. ;‘I! ..* 

,>. ,- ,-, - i ,.,. , .,! :;;r,: 

Fort himin Harrissfl 

Indiasapolis, hdima 

---. -s-e -_---._ - -.... -. -m-s-- U-Ryl’ ,:., j i .,. - 

-__ - -- 

,

. r 

� � 

. . 

rnl.ITAkf ies?ING ASSOCIATION OFFICERS 

PRZSIDENT 

Colonel Jemar C. Doneghey 

Comaending Of iicer 

US Amy Enlicted Evalurtioo Center 

PRESIDENT-ELEtX 

Colonel John V. Patterron, Jr. 

Cowending Off fcer 

6570th Parronnel Rerearch bboratory, US Air Force 

STEERING colonnm 

Dr. Doneld L. Warno. Chairman 

US Army Enlirted Evaluetion Center 

Colonel June@ C. Donaghey, US Army 

Colonel John V. Patterron, Jr., US Air Force 

Captain Richard ?f. Hayes, US Nwy 

Ceptein J. P. Martin, US Coast Guard 

Lieutenant Colonel C. J. Chatroon, US Amy 

Lfeutenent Colonel A. S. Kneuf, US Air Force 

Ua)or Frank L. HcLaoethan, US Air Force 

Litutenent Caxaendcr Prances S. Turner, US Coast Csard 

Major Doneld L. Diamond, US Xarine Corp. 

Lieutenant 8. R. William, US Coast Guard 

Hre.‘Mabel 0. Brunner. US Air Force 

Mr. C. J. Hmeluro. US Navy 

SIXTH ANNUAL CONFEREHCE CCFBIITTEE 

Mr. Frank $3. Price, Cha.irman 

Captein Henry H. Bahner. Vice Chairman 

Hlljor Herril R. Owen 

Hr. William W. Wance 

Second Lieutenant Martin S. Brown 

SIXTH ANNUAL CONFERENCE PRCKRAM CCWlITTEE 

Hr. Claude P. Rrfdger, Chairman 

Hr. Dale R. Baker 

Hr. John S. Brcnd 

Hr. Arthur E. Hoffman 

** . 

The opfnions exprerred in the paperr prercnted in these proceeding8 

are those of the authorr end ase not to be construed as being official or 

in my vry representative of any of the US Armed Servicer. 

. 

F I

_ - --- _ 

.._ 

----.---- --- -- _____ I ._--. 

. 

-. - 

. 

. _ . - -. .- - ----- - 

_ . _ _ -. . . 

- ..___ - ._.- . .: . . .-___ 

. ---. 

. . ’ 

� � 

. * - 

. 

. . . 

. 

. 

---._. 

. . , 

.. --.-... -__ _____. 

2; 

._ 

-- 

. 

. 

. 

: 

. 

. 

, 

c 

. 

. 

. 

. ,- ; 

. 

. 

. 

: 

-, “.,,

:a-- 

� 

. 

. __ ._ _ .._ _ .___. - - - --_- __.. - -. ._ _ _._--- __.__ _ - ._-_. . - 

. . 

. . 

. 

.- __-yI----- ---.-- . . __I_ -..- --.. --- -___ --.-.---...-----

. . 

. . 

. . . .-. 

. 

. 

C8owal Syrposiuu It Tart Conotruction Proceduree, 

Williaa u. Wbnca, c!-4nnem . a . . . . . . . . . . . . * . . . . . 

* 

Tut Colutructicm 88 8 Subrprtea Within � �␛�� Approach 

to Training Technology, Iredora J. bt- . , . . . , . . . . , 

Sumaq of Pragmatic Creativity in B+tiaatfon Conotructioa, 

John hedifotd . . . . . . . . . . 

Bvaluatloa of Notor’& il;, ‘&l&*E!: &&id; . . . . . . . . . . 

Performance Teot Construction, tred B. Horn .-. . . . . . . . . . 

08narbl Symporiur 1st Xntorprotrtionr and U8u of Bvoluaticm 

Ruulto, )Prr. Genwieve Schulter, Chairman 

One Ioterpretatfcm of the Major Goale of Specialty Knowledge 

Tutlng La the United Statu Air Force, 

. Stephen �� 

� U8eI of )oS Bveluation Teat ReIult8, J, B. Rohrsfter . . . . . . 

caner01 Symporium XIX: Job Anolysir for Teat Devolopwnt 

Purpour , Prank I.?. Rico Jr. Chainam . . . . . . . . . . . . . . 

Neu Pmrpectivar in Job Anrlp8i.8, Joreph R. Harsh . . . . . . . . 

)soS Ev8lurtion Teat Outline Sadnat, Curt18 D. KcBrldo . . . . . 

Group Dfrcurrion I: Ita, Writing Procedurer for Incremiag 

Validity, I. J, Newnun, Chafman . . . . . . . . . . . . . . . . 

Group DircuorFua 11: Non-Bnpirlcrl Velidbticm of Tart Item, 

J. L. Pinucans, Chaima~ . . . . . . . . . . . . . . . . . . . . 

Group Diacurrfon III: Te8t and Itr Revf~im Techniquer, 

* J. g. Partington, Chainsan . . . . . . . . . . . . . . � . . . , 

Peprrr An8were t o Camon Criticimu o f Tolta, 

, t P~kR.RIceJr........................ 

1 

Camnlttm Report I: Sturfng Comdttee . . . . . . . . . ,, � . . . 

Canmittm Rqmrt ZIr fca Writer8 Aptitude Tut Devolopaent 

onmlttu � � � � � � � � � � � � � � � � � � � � � � �� 

Roster of Confereer . e . . . . . . . . . . . . . . . . . u . . . . 184 

._~__ - . ^ ._ .-. . 

. 

. 

_ --‘T’--- - 

I 

e 

---- 

. 

. 

. 

,rcc 

. . 

.’ - 

. 

. 

: 

. 

. 

121 

122 

126 

127 

132 

136 

141 

148 

149 

156 

160 

161 

171 

173 

181 

182 

P 

4 Y

. 

. 

. 

_ _- -_ _.. ..--._... -.. .-...__- --_- __ ,_ __ - 

. 

“1 

. I’ 

. . 

__,._ 

- F&r.. L...A.-.-.----.. 

w-h_- _.__ _I_____._. I _.____..-me.- --_ 

_______ _--------- --..- . .-. 5 

Foreword 

P 

Unharaldod at its inception Ln October 1959, lhs Xilituy Tsrting 

A88ocLatioo h88 dwelopul into a povorful influence in military tutlug. 

The purp03s of tha haadei.atiou ir to bring together annually rrpresontatiVM 

froo. the V8rioU8 ~m8d 88rvic08 to difKu88 and exchaqo idaas 

. WUCOYUtKlg job PrOfiCtOUCy WahatiOU Of enlirtad p8r8CWl81. nU8, one 

of the more important 8nd productive -pact8 of ‘:he annual couforanco 

i8 8 WOting of mind8. Confermcer h&v8 bean hold rithin 8 liberal 

. .A fr-k of camftter mestingr. While the liberty of ccmuitter dircu88ionr 

tur provided opportunity for crsrtive thinking and productive 

dircurrion. the rtructura ~sca888ry for exchanging &8 much informtion 

88 porrlblo rithin A thort ti.m wa8 lacking. h progrrn for the Sixth 

Annual Coafumca ~88 developed vithin uhich auimm opportunity would 

oxirt for the prerant~tkm of rorurch rwultr and operating lnfornutioo 

through tha prerentation of preprred paper8 and for the exchmgm of 

id-8 and crutive thinking through group dircusrion8. fh8 eonfsrsnca 

wa8 divided into two theoretical rad tw geueral 8ympori8. P8plrr 88 

prorented in the 8yPpo6ia 8nd report8 of group di8curriou8 are roprod&Iced 

fn thooe procauiiagr. It ir mticipat&d that there proc8ediagr 

of tha Sixth Annual Conference and the procoedingr of future couforencos 

~111 becau8 a 8ource of infornution on poychologicrl teoting in the 

ni1itaX-y 88tablirhment. 

. r > *, ‘. ,k ‘*> y,,. . .‘J 1-. i 

‘- The Sixth hnnu8l~bnfsre?e8j;]a8 hdld et the US Amy Enli8tad 

Evolurtion Center, Fort B8njtmin F~T~~OTI, Indiana. Tha c-ding 

officer and staff of the gnliotrd Eva?uatFoa C@nt8r -.irh to uxpreor 

8pprcciation to a:1 COnfOr for thoir rupport and ccanplrts cooperation. 

Tha 8ucce88 of thr Sixth Annual Conference wa8 dua to thr attitude and 

offortr of the participu\rr. 

iii 

. 

4 

,

__. . . -- .;- * . -_ 

. 

_... - - - _ 

+ 

. .-._ 

. 

. . w 

1) I. .-c -. - _ . . - _. ..--z--.-m - _..--._. -_- -_- -. _ .._ _..--- --.-..-w .Mrm. _ 

I 

I 

1 

KCLXTARY TKSTXRC AssocIATxm 

Sixth Annual ~oafsrencr 

Colonrl Jm.rs C. DonaghcF 

Prwiding 

1

.-. c 1. - - . . __. 

. 

_ _ _ _ . -- - --__.- -_ - . _- --.- -.. .A---- - .--.. _.-___ - _--.---.-. 

.- - - - 

, . 

. 

. 

_ _. _ _ ._ I 

! 

3 

i 

~-RUB* ----_ _-- 

. 

. 

~--~~----.A-‘-- .--.--.---- - - - - - 

I’ 

a 

a - . - - - - . - - - .- 

! 

I ! 

I 

. .- 

. 

\ 

Rtnrrhr to 

Wilttrry Tortkg As8oclatloa Conference 

Ilepui~~O~~~l 

QMlce of Per-1 Operation6 

Deptubent of the Amy 

Tllmk m cool JJQw!%Y, 

I mnt to beginby exprecalng pry appreciation ror thle agporttmity 

Ican'tstsadberstoda;yandtdUtoyous6atastprryclhologletor 

expertlnpsychologlcal meaeuremEtibec6ll6eI6mIlot. IamzLnandIhera 

bMn for scoSx’6l par6 Mlrigned to pofdticm Where 1 hwe U66d the 

lxfonaetlon that you people product. In w prevlou~ ssel@mt, I have 

lhmedto eppmclatethe value of MIundtcrrt ma-t in h%lpfrq 

aDyon6 86eigmd a task and trybg t0 Eake the bO6t pO66ibh we Or 

per6omelzvoursesto ccmplata the task. 

9%~ cvalustloo of job proriLicncy orrem 6 challenge that 16 not 

easily overcam. bthfE&eyof eXpert6 and cadvanCedtCChtlO~~, it 16 

t%lf+ffcult to expect an accurak evahatlon of ld.ltidual potential 

accazplishcd with peper ard pencil. I 6.m rirmly cormfnced, htmmr, 

l&atycu,havetoalnrgedegree eccaupliehedthi6end. Just a6manag6m6ntbytheru.bdtbeth~6ibhssbeenre~cedby~ 

with a 

633d8 ld.0, 80 too ha6 OYtdu6tion by fntUftiOXi been nphCad by 6ClOXltifiC 

eralMt10n. (3exztm, you have pa66ed the point or no zxturn ror 

ob;Jtctlm evalutlmof job mastcry. We in the Anuy are rlmly conW 

th6tOUl-Fnflstcd~~t~OXlcSyat~16 &FetO St&y. IbO~fB~that OUl' 

EnU6tedgv~tloaSy6temletheamo6t eclentlflcalJyedv6mxitoolwith 

vhichv6haoatoQtebeenablatoprwidethec~ r and hle per- 

6omel6tafY. 

Ursfoz.-tumtely,Im6t ln6ux-t thstws have notbeun able to instill 

in our cnnrmRndcr6 a degree of confidence in the sy&em which ym vcnild 

lfka to achlow. 6iovthatcoloabl~~yardhi8paoplsha~devaloped 

instmmentathaten~thmliable andvaluble lnmewurlngthe indivf.dud 

job proclrlsncy, my miesicm frcm Eeadquartere, Dcpartpnn~ or th6 

Amy16 tohelpthe6epeople developtb6 undemtmdlng andcoxLfldence of 

cur lbry c-ers. If I mkn no other point todfiy, I reel- to 

1 

.

, 

. 

M 

‘, . 

i 

i 

,- 

‘. 

. 

toll you th6t th. grrotert short-coalng in poychologlcal test wuraomt 

f&y i8 th6 16ck of undsratmding of proficlrncy 6ValMtfOft by the V6w 

p6OplU1 thrt muot U06 this tool. without 6 knouledga of th6 teehiqueo 

fnvolvsd, memagms6nt i6 fr6qu6ntly 810~ to 6ppreCiat6 th6 Cool th6t GUI 

b6 provldod by thlo 86rvfc6. Th6 tnliot6d Evelu6tion C6ntor h6r6 ho8 

taught 016 much of wh6t I knou about poychologicrl toot m6aourement with 

+6p6r and pencil. f don’t pr6tend to know th6 d6t6ilo--that io not w 

job-but I do beliov6 w6 h6V6 oaa6thing bore that m can n6v6r 6g6i!¶ do 

without 6nd thr largest tuk reaoining Lo to ukr those uho arot u66 thio 

infOrPrartiOn avare Of it0 Vo1U6. 

Enlioted rvoluatlon In th6 Am7 go60 b6ck long bofOr6 th6 curr6nt 

oyot6a. I can racall th6 Army’8 Carwr Cuid6nc6 PIan of 1940 vYIlch W6o 

6n 8ttempt to ~06 testing 0~0~66 to muoure prOewtfon pot6ntial md 

d6t6nain6 thr ultlm6te promotion, prior to World War II. Going back 6v6n 

prior to World War II, th6 lndlviduol br6nch68 of th6 Amy devrlop6d t6oto 

to 6ld thmo in the promotion of noncanioofon6d offlcrro. The � ␛�� w 

h8v6 in the Amy today go68 fro b6yond thoo6 iaftf61 6nduvoro. W6 ~06 

our 6nlioted 6r6lrution to doterain rinimum qu6lificrtfon in th6 rilitmy 

Op6Cidit7, to d6t6min6 th6 6w6rd Of prOfiCi6nCy popbnto for OUp6riOr 

quallfic&tion, 6nd u6 ~06 anlloted 6volu6tioa 48 8n fndfcatlm of potsnti61 

for th6 promotion of our enlioted pwoonn6l. I 6o1 our6 moot of you 6re 

ware of th606 rpplic6tiono for our � yotao of mliot6d 6V6lUotiOn. w6 

6100 u6e enlloted wolu6tioe for gr6da det6rmiMtfon for psroom6l r6vrrtlng 

io UI enlfotrd ot6tue. It lo not gmamlly kncml that W6 eloo 

Uo6 the 6UliOt6d WoluetiOU oCOr68 for 6OOi~t o618CtiOn purpOo60 at 

department81 16~61. 

In ths Enliot6d Poroonnol Dlrectoret6, ths 6ooignr.?nt of our 

mOt6r � ergunto, firot � �� and 06rg68nto mojor io wAd6 ~1317 

after 6 couolder6tion of th6 6VolUotiOIl 8cor6. WO h6vo eotablirhed 

c6rt6ln minimum 6volu6tlon 8cor68 for 688igmrunt to hlgh priorIt+ and 

critical pooltlono. W6 hovr 166rned thr value of 6nliot6d evrlu6tion 

� �� fOt �� for 

obvlouo ruoono, w vant uut b6rt ~60~16. 

Thr ~06 of our svolu6tion q8tm in th6 Amy h66 grown by leopo 6nd 

boundo. W6 hove r6centlp lnltiot6d 6 progrmu for 6voluating 8 roldi6r 

Ln okillo oth6r th6n hi8 prinmry. W6 er6 tqiug t0 68t8blioh 6 valid 

oklll 16~61 in wr6 th6n on6 op6ci6lty. 6n6bling uo to know fully tho 

pot6ntial of our emlioted p6roonnrl. W6 lik6 to f661 that thla 

axpuloion of our ev6luotion c6pebiliti60 will 6nabl6 us to Lee thr 

Who16 ooldl6r, r6ther thon juot the primary � killo of thr ooldi6r. Thuo, 

6v&luotim of ~rcottdarg okillr will 6n6bl6 uo to wrov6 the uoo of our 

omnpov6r r6oourcao. Th6 old FrOblu i8 that c6rt6in ok1118 ato n6odod 

in grut6r proportioa owrouo than fn th6 Unit6d St6too. If ve only 

2 

. 

. 

-A___ ._..-- 

. 

__...--.--- 

. 

* 

--- 

. 

I 

. 

. 

. 

I . . 

. . . 2 . . 

’ . 

*. 

- ‘. 

,. ., i 

.I b 

1.

(i 

. 

* -. -._ _ 

. . 

. 

. 

have one @kill identified for the roidier and tbst skill 18 one uhich 

lm need priorrily overrear, that roldiar doesn’t get hi8 fair ohare of 

tima in the Waited St8ter. Utmn’ two or more rkillr are idatiffed, 

more equitable dirtribution of ovormea duty ir posrible. 

hnothor uso of the Eelirted tvaluatiq System ie the waluation of’ 

the Army Rererve uld Natiou81 Guard enlistrd personnel. Rvaluation of 

there people vi11 ‘aid ua in determining mbilitation pxentlal a8 wall 

&8 providing an edditiocml Incentive for masuberr of the Rcmrvr mad 

Mational rwrd to maintain tholr milltarp rklllo. We are extremely 

proud that our Enll8ted Evaluation Sy8fem has been axtandrd to our 

enlisted Reoerve carpoaaetr. We consider this � otep in the furtheronce 

of tho ona Army concept of teamwork betworn tba Army compomntr. 

tvaluating rrcondary aed additional rkillr and evaluating perrmoel 

of thr Rsaem compoearta hu b88.e made porribla by tha u8a of tho Hcil 

315 Sy8tm, vhich w na uaa to handle tha ta8t data here la tha Enlisted 

tvalwtioo Cent*r. rtre uao of � utaatlc data procraring bar enrblad UI 

to look fonnrd to future application8 for our enlisted ev8luation. Flora 

at thir iertallatfon next month, rrq-creotatfvar of the Continental Army 

C-d, my offica, and talioted Evaluation Center ~111 meat to dircurr 

the potoetlel fur providieg a training evaluation report to our ccnmmderr 

fras thr ccmpaay love1 to the field Amy. If the proreguirlte rkillr to 

8 glvm Job can bo Individually evaluated categorically, thy CM aleo 

bo evaluated end ourmarired bp the unit or by the’ cpaciallty.. %lo vould 

enable tho unit comsander to deternina area8 vhlch require mupharf8 in 

hir training progrm, em ~811 cd helpins u8 doviro ochool program which 

vi11 insure tho � daquata troinfng of pcrrroenel in their military rpecialty. 

We na dlvide.our evaluation of enlisted porronnrl into flvs to ten araa8 

appropriate to thr given specialty. It ir there areu vhtch we Lateed 

to analpre for -raining defic.iencfrr. It 18 anvirionad that all our 

canandorr vi11 be able to use the informetioe which ~8 provida. I wka 

this point hero becaure I want you to underotaed that we feel that the 

product of our enlirted evaluation rfll be ured by our opcretlom people 

and our training people. tnlirted Evaluation can and will rupport the 

vary hurt of our Army mir~ion of rudinerr to �� uie lo limited 

or general ground combat. 

I vant to conclude on thir note. The contribution8 of objectivr 

rvalwtioa of indlviduel Job proficiency to the mi88ion of the maed 

rrrvicer of the United State8 are inmururable. I an convfecod, aed in 

fact I kwu, that more and wre of our ronior officer8 aro rocogeisieg 

&fly the catrlbution 0.’ proficiency evaluation to the task of tumagrmeat. 

A8 rout cyatrn becuner swra rophfrtlcated mnd you ruch for 

perfection in ths raliabiltty cmd validity of your teOC8, you Wu8t 

novor forgot &at vlthout the confidence of mnag8mut, your product 

doer little good. The ao8t valid evaluation � ecomplirhor nothing vhbe 

.‘ 

. : 

I 

:“i f 

, . 

i 

1 

. 

I 

. , I i 

-. ---. 

7

it falls on the deaf eara of mt. Tberhalldngebef~youis 

lmm t?.iaD mm?4 illpmhg your f%al.uation, it. ia 6ou the uee of 

objactitre ovahlation for the many prpo6oe which at can 6ome. MO 

know that encb of you ViU contribute, In tb3 cow&o of this conferax, 

things that will help u8 llzgxxm our Imisted Eva3,uation SyrJtela. l&3 

hop0 that in your tima hero with the Fhluation Centor, we can offer 

you ~csaethl.ng to tab back to your jobs. 

Ithankyou. 

. 

Y

.-- 

A- 

. 

__ .-._ -.-.. - -‘--- .- -.- : 

. _ ..-- - . ^-^ _ _ . . _ __.. - .._ - _. __. _ ~. 

. 

Remarks to 

Ifilitaq Testing and Aesociation Conference 

X&JOB DC&AID L. DIAZUMD 

Officer in Charge, Teat Sectioe 

US l&rim Corpe Inetitute 

Good afternoon Iadlee and Gentlesten. I me lookiag fonrard to 

having the opportunity of mingling with you gsntlaxn who are euparte 

in tha teeting fiald am I have very limited axpariancs and kawladge 

of thie aubfect. I vi11 briefly axplain the Marine Corps Teeting 

Syeteu by firet etatiag the laieeion of the Harine Corps Inrtitute. 

Our mieOioa lo to prmtids correepondonco coureau in baoic zailitary 

eubjacte to aelisted liarinee and :o prepare and procore sxaminatlone 

es directed by the Cawuedant of thr Marine Corpe. We have a third 

miesion whfch oftan makae It difficult to accaaplieh the first twoto 

provide ceranonial troop0 to the marine barracks for participation 

in weakly parades during the � wser month0 and other carcmumial c-ituente 

throughout the year such � e military fueerale, etate arrivale of 

foreign dignitariee, and other White Xouaa functione. To gat back to 

‘tha teeting bueineee, 

the teete we prepare and adaiaieter ure not pm- 

foruance evaluation teeto but are ueed to maaeure the szminee’e 

knwlodge of general military eubjtcte. WC prepare and � dminietor four 

groups of teete. 

The first g&p IO the Officerr’e Adminietrative Subjacte Exaeiinetioe 

which is aduiniotered twice a year to lieutenants and captaine. The purpose 

of thie sxarsinatfon ie to motivate company grade officore to fazeiliariee’ 

thameelvee vlth variouo adminietrative type subjecte with vhich 

they will becoma involved throughout thelr’careere. The subject? are not 

tactical uor technical in naturo, but cover ouch � raae ae law and Lagal 

Mattere, Supply +&nagaeant, Plnanclal Haeagamant, Pereonnal Administration, 

Caneral Adminietratiou, and Organitation, Camand and C-d Relationehipo. 

The officer lo � dministared a teet ae a lieutenant and agafn’ ae a captain. 

He muet pare tha test or retaka it the subeaquaet yaar until ha doae pare 

it. To data, at laaet, the reeulte of thie teet do not reflect upon 

his fitnese for prcaeotiou � e the promotion board doer not eea the teet I 

raeulte. However, if the officer IO a repeated failure, hie comaeding 

$ 

officer met rarka a 

c oeeeemt to thie effect in hir fitneee report. At 

this point, the toot reeulte will reflect upon hie fitneee for promotion 

as the praeotion board views all fitness reporte. 

The toot itema are prepared by a board of officere ealected from 

Haadquartere, Marine Corps. They are put into teet �� duinietered, 

� corad, md reportad upon by Xarlne Corpe Inetltute. 

The � acond group ie tha Central Xilitary Subject0 Teet which ie 

edeieietered three tine8 a year to corporalo, � ergaante, and � taff 

sergeants, both rrgulare and raearvee. This test ie ueed to deteneine 

thair eligibility for praaotion and muet be paeeed before they � ra 

coneiderad by the Promotion Board. 

. 

. 

5 

. 

, 

i 

.J 

.! 

.i

* 

1.2 

P : 

I , 

. ’ 

. c 

, 

. .-. 

. . 

-. . 

. . L 

-.--- ------ 

_ _ ..-.~ _- ._ * _ ._- _.._. __- ____.__-.__ .- ____..__ - _-_-- --e-v-._ 

In the Marine Carp8 we expect all !4arinsr, rogardloclr of their job, 

to be potential cozabat mariner. Thereforo, thio toet covora thoco oubject8 

vhich aro conridored to be ind18pen8able for wrines who may ba 

raquirad to perform duty ,in a ccxnbot aree. There rubjecta include Squad 

and Platoon Tactic8, Cuerilla Warfare, Scouting and Patrolling, NBC 

DOfenle, and Pir8t Aid, to name a faw. 

Technicnl proficiency within occupational field8 are deteminod by 

meena of fitnrlr report8 and proficiancy mark8 rulnaitted by the mmrino’o 

commanding officer on a periodic barris --but aot 1ePa than once every 6 

montho. 

Tho third group of tert8 ir the General Military Subject8 Proficiency 

EvalUation Tact prepared once a year and prcvided to unit C-dOr8 

in the field. Tboy a&inirter, score the tndt with tamplateo provld8d by 

MCI, and evaluate the rerultr with rt8tictical forma and inrtructionr provided 

by XX. Th8 pUrpO8e of thir tcot 18 to enable unit camandorr to 

avaluato their own Unit and 1ndividurl training progmnr urd epot the 

WUkIIe88e8 in variour rubject 8rOa8. The 8Ubj8Ct8 covered by thio tzot 

include thO8e previourly mentioned in the preceding tort plus: Clore 

Order Drill, Hilitary Courteey and Diocipline, Wap Reading, Comnunications, 

Milltarp Training, Danertic Dirturbancer, and Technique of In8tnxction. 

A differ8nt te8t lo prep8red for each of the three rank groupr-one 

for private-lance corporal, ona for corpxalr and rergeante, and one for 

otaff NCO8. The t88t range8 fran 1‘75 itcrrmr to 250 IteM whsru8 our 

othor tertr era limited to 100 itsmr. 

The fourth group 18 the Inopector Goneral TO8t ured by the XC te8m 

during their annual in8pectfon tour8 of 811 tlrrine COtp8 unit8. Th88a 

tcrts are ured to ev&luata the U’Ait training progrm � l8o. There te8t8 

� ro prepared to te8t the oame rank group8 a8 the previoU8 testr. Th8 Ic 

team � dmini8ter8 the te8t8, rnril~ the rerultr to MCI for Ilcocing, evaluating, 

end roporting rho ra8ultr back to the unit 8nd the IG teas. 

t!CI 18 currently prqaring rpecificationo for a computer syrtem to 

br inrtalled, hopefully, tow8rd the end of fircal yur 1366. Thio 8yrtan 

will be a boon to our curtent ta8k of evaluating corrarpondence cour8e8 

and the variour te8t8 which ve adminirter and Icoro. At thir point I 

wuld like to uprerr en appul to tho88 of you who h8ve computer inotrllationr 

rupporting either correrpcmdence Cour888 or your torting progrm-pluse 

cont8ct me at your earliert convenience end oxplain your ryrtemr 

end operatlonr. I wuld like to carry back your idea8 and rmthodr to 

incorporate into our propooed 8yrtem rpecificationr. 

In cloring, I an looking fonmrd to gaining a lot of valuable 

informtim to h8lp WI end the U8rino COrp8 in our future tortfng 

prograar . 

Thmk you: 

. 

6 

_.. _ _ . . . _-_ 

. _1 -- --.. 

. 

e 

. 

. . . 

. . 

. 

. 

J 

.

: 

i 

f 

I 

. 

. 

_.. _ ._ ” .-. -. ._ _ ---_ _-.-.. _- __._. _ c.-----_-.__- ___. - ..--.-.- -. . 

. 

. 

. . 

‘V-u--. -- - ___ ._. - -....- - ---- d- .__--.I_.._--- -- .-.... - . 

. .-. __-_._ -----.-- 

.’ 

.pai%rko to 

Ithasbeen auggeeteplthatazyrexuarks shouldconcerntheEXEUF6D 

uxlg in armtd-6ercTices examinations. SincamanyoftheMhlmmbershere 

todapnaybeas~~thateetingBamcasIam,lt~~proper 

that~lookboth~thepa~tandppneentbefo~ews~toZook 

to the futuzw. l!heresretwoquestlonsuhichbotholdaMnevmmbers 

mustta!u3tipssto considerbeforeb attempt toevsluatethe futursr 

1. Xaveuebulltourhcmeuftetrtlng6ub6tanti~ Isour 

rauudati0n alAd? 

2. Canw safe look bta the f’uture? Are w mare of our 

----?I 

~t~tblr? Are ve aoun in our prarent performance? 

!L%e snmerstoboth qumtloas becntobe positive. The pest rep.rts 

of our organization indicate that we have been united in our actitity. 

t&3 have three c~ ereas for self-pfaiec: 

1. We ten bmrt af our conwte attention to the vast mount 

or detail favolved in lndldbg summful testing witere. 

2, our c~acernwiththeproblcaof creatingbettertests conthluel 

to be urupmittlng. 

7 

‘_ 

. 

. 

. 

.-

- 

, 

/ . 

, 

_ _ . _ . . . * .__.. - 

. 

m 

\ 

. . 

. 

. - 

- 

__..__. _-- .-- ,-- - -. 

acccaupllshmint of our original alms has beccowof r&nor iapxtance to 

the und.reeoEd of expfmsicm that haa oecttred. In tha early fbqe, wa 

couldproject anicbalthntwsona ofhorizontalclai-. #s could 

veil lmaghm and work for impronmaats inquautycup?-oductionard 

tridnlng, an Increase in the rnrnber of tecte writton (euui at the tiESb, 

incraaecsiapor~onaeltocerryouttheloed)arrtaninmaeein~ 

preatlge aftha indivlhal ixnters. Host of.tblshasbet~d.one--althm& 

it wsn% until the 1363~coderonce that WJ 8brted to mrk cm ~1~j3 

dellbarata~a~etomakethevalussoiourrrorklnww~. 

TIna has proved that the acccesplishmmt of our ndealon ha8 bean a 

lrtcp not 6n end. khavcincre55edladepth asrrrallaal&erally. Ve, 

theMIIA,havedcocloyaddbvicos,Prans,aJsdtecfmicZuasthatholdcballeDglae 

possibllltlee. 

lb havereachtdthc pointvhereve canexploretbeuut.er~cesof 

tQstl.ng: ua sro phyaicauy pYTqm?d. UO5X9OIlSOUdgD.d9~aUr 

takeoff. E?utarowpyd.10lnglc8llymsdy? Areyoureacl# Dow 

relax In cm tmethly nmning operations, or do vu met the challenge 

of what can ve ~DP pfy aoureeb regm-dlng the M.I!A'e poeoible &FIIVA fn 

exploxd$fi~eewtest~corrmoe are the reprtsoftbe pmwl0us cunrem-8. 

Lf1maytakethlsllberty,itwmulc? seemrraa them xvports 

that In ears aretm theHl!Ah.aabeentoo con-tith -ata Curicons 

to recofi~ Ga that the space beymd thea Is real~&%&i% 

UnllmLted." At the lWmlExminlngCe&erwe b.ave reachedthe poLt 

Mm-13 the order&q of emunlmtionsbytbe Fleet, apcessthathithmto 

took 742 man yea-8 per exmlnatloapcriod, cmnowbe acccmpllahed 

automatically. ~1ongorbwoeachunft~~tospcndthatiEsIl~tinS 

ztect?searyexeuldnatlons. !rha IEW autaaation nov coi7rplQtQs tho onttre 

mund o.? the examlnatlon process, In rddltiun, we have reac.hQd tha 

point whem itidividual =tura~ 5n3 6tia8diccilly r0m-c-l- 

PAKt system 80 thatallpersomelrecorda iwe ~~~@.~telyUg-to-clnti 

d.lwst 68 Bean 58 the individual 8COm 16 knUW!L %!hib affords m 

0fYicicnt distribut2oaofenllfstedperm.melthr~the~. 

In pas-t M?A zncotiags, acc~to the recoxYla* w ham been concerned 

tith the "nuts and bolt&" OdLy In 1963, vlzaa wo att-apted to 

glintbe recogxlitlonorDm,didwl dupartrrcmpurQlyut1lltarlan 

8834Xtt3. Wo have gdaed great twhnics3. itduty. RJW M mu8t CODcentrate 

on valulcr uf tmtlng, We rru3t roach out ml oxplor8 new 

.-. _-._ .______ . - - ---._-..--_.---_P_--__- 

. 

. . . . 

. . . 

8 

I 

. 

__ . _-. .-. . -. . - 

. 

. . 

f 

1I 

1 

f 

. 

Y

-. 

/ 

,’ 

i 

_. . . _. __ _. __. .._ _ ___ -. __- .-- -_ - ._ __-_ . _-.-___ -___ .__-.. 

, 

. . 

. . . . . . --..-_ _ _._ 

. 

. _ .__- .._. --__--. -- - -_---. . - . i ._..... - 

_. _--- _--. ------ .- 

. 

\r 

* 

Perhaps, BI a group we have teMod to remln Btatlc. Ia this 

becaueevofeelthataurinterastinaduc8tloDdhprsgressls t3ub60rrri0at 

tommber&lpinourtterrlaet Allafourllttaeaxwdevotedto~asas 

sspact of ta. )ie 6houUtharcfarercdiza that education ie one 

0fth8~6tp0wtr0~t3int.bk3~23f30f~. 

Slnc8 thetop~llftlce in the ileldaftwtlngam gathsmdbere, 

l!Betln&oxlacarmaDa gzpnmd ai f.xrterest, ham!t WeI tha ctblc8.l right to 

rtmoln etatic--tohats ouroretlrg6mmdy~nicelyorganized commutlan? 

1 umpasltlvem allbava a etrongtdth lnthu lnUvldualpotmrtlLl 

of the HEA. WebelLeve Rtroaglythatlt canbetcsmths (zEnAmmm 

IXTESTIRIwer&mm. Wsbulient&tveereprqxmxlardab&to 

arp- unknovnrields. Wsbc13evathst0u.rarp10r8tlsnati~bs mmad. 

I& us xmf, at thlr conPcxwlce, use aar CaUctlvd craatira lnBgiMt1on 

to aseart omwelwa 

, 

, . . 

. 

I 

. . 

I 

i 

4 ; i

* . 

_ . . _.-* --. .___ . _ _-. .- - .-.- 

. 

.c - 

. 

R8mA.B to 

KIlIta.ry Tertln~ Asaocintlon Confetanci 

LiQUtMQit CO1OCi81 8. 8. &UDf 

f o r 

COLQSSL JoBlD V. PXRZRSoB1, JR. 

ccQal8xldIn~ Off Icer 

657Otb Peroomrl ~srurcb laboortory 

us Air Ports 

Tuo Itcu of probable I~torart to the eoafrrear of tbr Wlltory 

Tutlng AssocI8tIw concwnIlq the Air torte SpecIalt7 KnwlQdge T*rtI!lg 

progmm are: 

(1) The clorar relatIon8bip developing betmm Air Force 

trainfn~ ahd Specialty Knowledge TutFag, ukd 

(i) ’ Rramt developmnt In tmhnlquoo of job Qnal78Ir. 

Although both of there, tolpiC8 wIl1 be the 8ubjQetr of tiditidual 

pruantatiour during tka cesfareaco, ‘L would 1IkQ to-brtefly Iatroduca 

then. 

Apptcolhatoly a ya8r aa0 tha Air Force 8dopted what 18 CilllQd (c 

&Sal-chmxml approach to on-the-job treinirq. tfadar tbir concept en 

airn~n ir prwided with 8 rorlas of Cue&r lMvQlcqsen?mt C~U~~QS ubich 

(l;r@ i.ntsndsd to cwer thQ broad fUd8WXltal8, theory, :md principles 

nhlcb ho Is raqoirQd to kaw to progrors or bo upgrQd@d ia hi8 QpaeIalty. 

‘LbO8a c0urs.u are dQvQ1op.d by Air Troininf Comaamd t8ChZliCd ldtQS8 

and prfntod and dI8trIbuted by thQ ltptenslon Cwrres Inrtltutr o f thr 

Air thlvorrity. Tb~y are Intenbad to be rQlf-study type cmr8e8 ubIcb 

the 8iSmM COLBplQtU 8 8 OZL ordinary COKSe8pond8W~ COUtQB. 

ibe other channel of tho &Ql-churorl CO6CQpt I8 BCtrQ COnCQtnCd 

with the 8wifiC tS8iniJl~ required to 8ccaPgli8h tb0 QpMIfIc doti88 

Of b18 iW8diAt8 job. Tbls � rpect Ir wr8 concwrd rith tkr parformaxe 

on OpQclfIc UQxrw 8rMs of thQ rpecialty or thr uoiqar equipwnt 

of hi8 pQrticulQr ualt. In this PhoBQ he � pp1b8 thQ fundaa8ut8~r and 

throry 14axmd through tha Curer Mtrlmt Courrer to hi8 QpacIfIc 

unit rlrrim. ‘2hIC pb~8Q I8 8up~~is8d Qnd Qccoxplfrhed priMrIly by 

the InditFduQl casu.ndo, rfn(es or oven dam to equadroa hvrlr, . 

m8 under Our pt8SQnt cwc’rpt end method of devrbpa8nt me rQ1nt.d 

directly to thr C8rQQr D8valopwnt CCRirQOQ. They 8~rtQ priMrIly M 

and-of-cctrra �⌧�� dmInIrtar@d under coatrollrd conditiona. A 

rQ8QQrch progrm Is prQmatly uQdQrwQp to QvQluQte thQ SKTo drtrlop~d 

to pQrQ3lrl thQ Cuoer Devolopent Cesrra8. 

10 

il 

.

. 

__ . 

____ . . .- ---__.._ - 

__.___ ___ __----.- 

. 

. 

-.- --- - 

l%r Air Force l8 ozl the vuga of opar.tia& ufoptfa$ a raw 

tecbique of andysing jobe drfch we are hopeful wit1 pravida more 

def?lftive fsfozzaatlon on wbicb to baaa both tr&fniag md mote. UadQr 

this @yetam task invcmtorler are prepared gad informatiou gathued frca 

fob incwhbmtr coucuuing mount of ttsw, hportanc*, etc., relative to 

each opocifie tuk. Through the use of high-apeed ceqmtero aad grumplrrg 

trchiqaaa, highly useful Lafornatiaa can be obtainad to aorirt in 

more occuratrly daaeribing vuiou8 jobr, to plan training curricula, 

aud to urite better tear. - 

If sithu of them tuo rrccmt devalopentr-tha clorer rolatiourhlp 

of trainiug md testing and tba edro~cor ia job malyria--achaally 

lln up to mar upectatiom, �� rO8t optimfctic a8 to the future 

of the *trman terting progrrn in tha Air Force.

, 

I 

. 

,* 

. 

. -. 

- - . . ___ 

. 

. . - 

.- .-_ 

. 

� 

Raaarkn to 

Wlitary Teotihg Aooociatfon Conferanco 

CAPTAIN J. P. MARTXN 

CoexamdFng Officer 

US Coaot Oua:d Training Center 

AD you know, leot year’8 conference we6 held at the Coast Guard 

Training Station at Groton, Connecticut. It was hooted by the Coaot 

Guard Inotttute, undrr the c-d of Captain Kurcheohf, who woo chairman 

of that conference. 

Since then, a number of changea have taken place in our orgnnirutioa. 

Captain Kurchukl hu retired; tho Inltituto lo no longer an independent 

c-d but ha8 been ccabined with the training rtatfoo to form what io 

nou knoun a8 the Coaot Guard Training Center; uumorouo changea in parronnel 

havo occured; Llrutanent Camander Hallock, for example, hoe bera 

rruoigned � o Ccman ding Officer of a cutter out of FJeu Bedford. In 

fact, of thoom uho attended lart year’8 confareaca, Liwtanaut Williir 

the only return-. 

For the raot of uo, we are attending an Wu Meeting for the firot 

timI. But I have had good reportr on laot year’8 mting. By all 

account8 it WI a plearant and informotfve confsrente. So I have reawn 

to look forward to ‘get.ting bettor acquafnted with you. I (m our0 we 

will find many � �� of c-n interect. We may find that we have 

� imilar problems which thir confw-ante mfght hprlp uo oolve. 

So on behalf of myoelf, the moaboro of my otaff who ara hare and 

officer8 reprorenting other Coaot Guard Unite, we are glad to be here, 

to joia in the dlocuooion , and to have a chance to develop ruu inoighto 

in tha field of milltaxy teoting. 

A year ago 1 wao oboard the Coaot Guard Cutter Northviad, conducting 

oceanographic � urveyo off the coaot of Horthero Siberia, and I can � ooure 

you that my oula concern8 at the tbaa wore far rroved from the field of 

tuting. If � meone had mentioned a “correlatfaml cluoter,” I night 

have thought it had o-thing to do with the Aurora Boreallo. And if � 

boatewaln’a mate flrrt claoo had came to ma and aoked, “Cap’s, how ccsme 

I didn’t do no better than I done on the Chief’r Exm?” it would never 

occurred to ma to inform hla that ha �� uffering fra a “regreooion 

� quatfon” and had wound up un the wrong old. of “an unrrlectad o~c~ooo 

ratio.” 

Throughout � rorvico career, of course, wa are apooed to a wide 

variety of teotlng; techalquao- but uwally in the role of victim. So 

it is a nova1 and intriguing experience to otmd “on the inride” at 

lart-to mope within � charmed circle whore one hoar0 mention of “*ufltlvariate 

mothodo” aad “attitudinal framer of raferenca.” 

. 

. . 

, 

12 

. ._____ - _-_.- -. -.-. ---.-----. ..----. .--____ _ ..__ 

. 

. 

. . 

. 

. 

1 . 

_-u--w- .___ 

. 

. 

V 

. 

. 

I

., 

� � 

. 3 

. 

i’ i 

i I 

, 

‘ 

.’ 

But ot my rate, at tba Cacut Cuerd Ttaiuing Canter I have 6 gaod 

crev of mprrtr who havr bean putting out sang thouaanda of erxuoinationa 

over the yeara--and I aa sure they have has; accoepliehing their purporo 

@it8 Wdl. 

What 18 our pwpors ia conducting czaminatfonr In the Coaet Guardt 

! 

1 

/ 

j 

/- 

8 

. 

Well, before m CM an-r that, it lo necearay to explain that wa hew 

ravoral dtffarsnt kinds of auminatim progrrrais. And thin calls for a 

furthar word on tha reorgrniutioa � t the Training Center. 

What hu hawned la tbio: Wo Coast Guard Training Station md 

the Coast Guard Institute--which ware two Bepareto ccmun da-ham bean 

eooblned-a8 I 8ald srrller-to form tha Coast Curd Training Center. 

I 

1 

I 

i 

, 

I lm urigmd a8 C-ding Of fleer of the Training Cmter. A s 8ucb. 

I report directly, to the Ccmtanbt of tba Cout Gumd in Warhingtoa. 

‘1 \ 

. 

8 

. 

Tha Coast Guard Training Center porforma tbrea priaary mirrionr. 

Thr Inatituta baa fu?-nirhed tvo of tha primaq Th8 n.taaion diVi8ioa8: 

Io8titute Division for Correrpoadcoeo Couroea and the Buairmtioo Divi8ion 

for Servica-Wide Examimtion Progrw. 

The third prisury misaim divlalcm of the Training Cantar is the 

Romfdmt Training Divirlon. This division contiauod to perfore the 

dutier of what -;~a formerly called tha Training Station. 

Ihe Bcrident Trafniag Dfvfrion camprim & mmber o f technic4 

8chool8 which afford trn1nFng to a COntk’tUOU8 flow of atudsnta Who cOQO 

in from 811 over thr Coast Guard. At ray one time the atudmt body will 

aumbrr about 700 ma. ~hep learn hou to care for aida to nrvlgatiou 

equipnat, rcmging frm unlighted river bouyr to rm’s@ lighthourw. 

Thay praparo to operota Loran Stationr with 13O%foot tranolnitting towera 

rhlch � ra atrung fraa the far reach.8 of the South Pacific to icebound 

prmontorier north of the Arctf: Circle. They lcara to baka bread, to 

road the dft8 and doha of %x83 Coda, and they acquire & great variety 

Of other 8kill8. 

The Inrtitute Dlviaion vaa 80 nmwd in order to retain the term 

“In8tituta” which haa had an honorable hiatory datfng back to 1929. 

Over thr yurr it h8a brctano knovn to hundreda of thouasada of Coart 

Cuudrmen asrking advracment with tha aid of corrrapondmco cuuram. 

for 

man 

Th. In8titutO DiVi8lOII pr8plre8, priotl, and 8dlBfni8tcr8 cOur8.8 

nrrrly all cO88t Guard 8nlir-d r8ting8: frm aviation � lectronicrto 

yaunn. 

* 

. 

, 

. 

. 

‘. 

* *, 

\ : 

/* : 

: 

t 

, 

I 

I 

i - 

I 

. 

I 

,

-- 

. 

. . 

.-I)-.-. _ _ _ .__- __ _.. 

_- -- 

At tha praoent t&ua � hundred and four coutaea ara ia the fiald. 

Ap addltional twenty couroeo are being developed. Naarly all ccw~oeo 

am undergoing conotant rwioicn. Currently, the enrollment overagea 

about 13,000 � tudento.ac any ene time. The average lesson oubo~looion 

rata IO .vory cl006 to ona per month par cnrollea. 

The preoent Director of the fnotitute Diviolon io C-nd8r Dohlbp. 

tilLa Ioyoalf, c- der Dahlby hao recently caspleted a tour of oaa-duty 

md he tello me that they didn’t talk much about “correlatlonol clustar8” 

on hio ship either. Having some fraa B coosand of I bouy &wider. tha 

Cutter Salvia, ho oeosao to think thot “Pcotingrr’o Notion of Cognitive 

Dlooonance” hao � uoethlng to do with fog � lgnalo. So I would � ay that 

it is juot � o well thot wa both are here. 

Both the Reofdant Training Divioion end the Inoeftuta Divlolan uoe 

wny quroriono ao laooon aide. Ttmy conduct claooroom and end-of-courao 

�⌧�� Thora exemo ladlcato tha degree of ouccaoo in completing 

a limited couroo of � tudy. They are traditional teaching devices. 

But the divioion of the Training Cantor which I would think lo moot 

ccmcbmod with tho aganda of thio conference lo the Examination Dlvioion. 

Thio divioion io under tha Dfrection’of Couxaandar Turner, yet anOth6r 

rljCent arrival. Mloo Turnor’o Diviofon lo primarily engrgsd in the oort 

of tooting that wo are here to talk about. 

the Examination Divirion conduct8 � rrvica-vide (that IP, Coaot 

Guard-wide) oxominationo of milttory poraonnol. Their usage io go>->rnod 

to a lorgo Bxtant by th8 manpwor noodo of the Regular Cooot O&ard to 

prrfotm ito poaco iaa mloriono and by the anticipated noodo of a grutly 

apondod Coaot Guard in tho wont of cobiliratlon. Peacetime miooioar 

rmain fairly c’onotant, 80, accept for updating mdtorial to take account 

of technological changoo, tha regular -inatIon program tendo to r-in 

fairly � toblo. 

Bocouoe war plan8 8ra rwioed, uponded, or occaoioaally cut back 

to accord rlth internation � graemento. aetloaal policiao, and changes in 

military concopto, our tralnlng and examination progrew-for tha 

roewver-utuot romoin flaxiblo. 

Bocaure of thio apparent dioparity in roquiromento botwoon Regular8 

and Raowvoo-and also bocauos of osparata � dm1nLotrative hiotoriro--us 

hava, in tho Coaot Guard, developed two diotinct axominatlun programo. 

I � hould like to uke a brief camparioon of the tvo. 

14 

. . . . - . 

. . 

,.

F 

‘. 

r2r 

. * 

. . 

� 

, , 

! ,. 

.’ 

\ . 

. 

-_’ 

. 

. 

-- -.- -_._ 

. 

, 

. -- - _ _..” 

..-..-_ ^. _-_..--- _ ” . _ _ . .__ . .._. 

Requitemnte 

�� ra rsquireC to taka rervice-wide axmtinatioue for odveacsm&nt 

to the thrao chief ptLy officer gredU --E-7, 8, And g--And for the 

firet wmrruat grade,. W-l. 

Ra8erven. an the other hand, are required to tAke ~xamLnacioa8 for 

advancewtnt to all petty officer grede8. In other word., they ere required 

for 3d class, 2GlAec, let clAee md chief--tbAt it pay grader E-4 through 

E-7. The Rertrvee do not have E-B end E-9 grade%. Easslinacisnr for 

Reserve mdvsncmtr are separate And dietinct fraa the exAminAtiona 

� daiaietered to the Regulari. A Raeerve applicAnt for WArrAnt grade, 

U-l, hovever, tdcer the sama ex8minAtlon a8 doer � Regular opplicAnt. 

hi8 difference in re?uira\lente for exmmlnetione would eppeer to 

t&8 into Account the fact that the Rseerve’e path of AdvAncemnt loAdo 

throua study couroee, veekly inetructions, and service-vida axunineefon. 

By contrut, the RsgulAr’r prospect8 for prmotion have e lerger elemcut 

of practical experience, daily performme of duty, And pcraozul evAlu- 

, ation by hle caimmding officer. 

Both Regular end Reserve are required to cmplete M appropriate 

corrrepcmdrnce coureo before bQing conridered for sdvmcentne. The 

d;ffarence enter8 tiei tha Resew8 aAplring to ps~ gr8der E-4, E-5, 8nd 

E-6 met not only rucceeefully complete an end-of-course oxem, but must 

aleo canpete in a service-wide mu&nation. life end-of-coures exam fe 

cdainirtered by the Inetitute DLvieion; him rrrvice-vida -inAtion is 

� dainietared by the Examination divirion. 

Prequancy of Examirution6 

For the Regular: Half of the CPO rpccisltfar Are exesized in even 

ycure. Thr other helf � re mined in odd yurr. WArrant officer exam8 

are &ministered during odd yeer8 only. 

lot the Reserve: drrainatione in all enlirted rpecioltier Are given 

twice each yur. Inrofar 88 posrfble, there -inAeiOn6 are bared on 

Hovy Exmliartion8. Rxminatfone for Advancement to warrant (U-1) Adnfn- 

Laterad once each year, we the 8eme a8 for rtgulArs. 

Prepsration of Examination6 

ExAminAtions for the regulAr program are caapltttly revised each 

time they us givm. hey conrirt of new itans furnished by itca-writer 

epeciallste brought to the Training Center on teeporAry AdditionAl duty 

prior to each group of txeminAtion8. Thair offering8 Are caabfaad vfth 

itar eelecteC from M item burk. Cur bank of quertione nw contAia8 

approximately 32,OQO ieaar of carefully guarded krowladge. 

, 

. 

, 

. 

I’

\ 

i 

i( 

.f 

I 

. 

As for the Reserve examlnatim program: Xn 1953 Bupers authorised 

the Coast Guard to use Navy examinations in our Reserve Progress. These 

were administered frua Coast Guard Headquarters until 1962. At that time 

administration was transferred to the Coast Guard Institute. The Navy 

exama are adapted to Coast Guard use insofar as possibke. Sots8 Navy 

exams are not adaptable to Coaat Guard use--notably those for storakeeper 

and ye- since our qualifications for these ratings differ from those 

for Navy personnel. In addition, we have certain mergency service 

ratings such as Coastal yorcarun and Dangerous Carganan which not only 

have peculiar names but are peculiar to Coast Guard mobilization requirements. 

For these we must prepare our own exam1natlons. 

Finally, Scoring and Evaluation Procedures Differ: 

Exams taken by regular applicants are scored manually at the Coast 

Guard Training Center. Item analysis is also done manually. These exams 

have not been adapted for scoring snd analysis by data processing equipment. 

Exams aclministered to the Reserves, on the other hand, have been 

adapted to the Navy data processing equiprent at the Naval Emaining 

Center at Great Lakes. Ue obtain scores, item analysis, and exam analysis 

by means of twice yearly visf ts to Great Lakes. There Is, of courue, 

greater volume of Reserve exadning than of Regul8r. 

This then is the outline and ssope of our esminatlon progrms au 

conducted by the Exsmination Division. 

On basic purpose in giving exmainations IS to obtain an objectiva 

measurement of military and professional tmowledge to use as one of the 

factors In assisting in establishing an order of eligibility for promotion. 

By canbining the results of comprehensive examinations with other evaluation 

factors, we hope to find the persons who are best qualified for 

promotion. 

Service-vide examinations also give us a clue as to the overull 

effectlvenass of all our service training programs. As we standardize 

our methods and procedures ve hope to be able to ccunpare our training 

achievements with those of other services. 

I think ve can do wre to standardite our requlremzntu for both 

Regulars and Reoervas. Only by crsatlng a parity betveen Reserve 

rrquirements and Regular requirements will ve be able to judge the 

relative veakncsses and otrangtho in our tvo kinda of training. 

The diffarances aistfng between Regular and Reserve requirements 

,for examinations are currently under scrutiny by tha Coast Guard. plans 

. 

+, 

c 

‘ . 

*. 

. 

. 

.-.-._ - _ 

. 

I

- 

; 

, 

. -_ 

. . 

. . 

.’ 

, 

i: 

‘. 

: 

‘. : 

. . 

. . 

. 

. 

. 

. 

ate afoot to bring the two examination programs into closer altgnnent. 

It is hoped that through the reorganization vhich is still taking shape 

at the Training Center, we will be able to streamline ths execution of 

our prarsnt callnitnents. And then we vi11 be able to assume an expanded 

� �� 

The big questions remain--and probably always vill. 

Bow vell do we measure what we are trying to mcaaure? What do ve 

wind up meaauting after all? 

Are present techniques actually reletting for us the kind of men 

we vaTin positions of trust and leadership? 

Can ve ever hope to accomplish thin goal with “objective” testing7 

If not, to vhat ext@nt are v6 “6tNCtUring” our group and our 

rociety with an Inadequate tool? 

Perhaps if ve could find the kind of person ve are looking for ve 

wouldn’t have to bo EO concerned with “pro-pay,” “fringe beneffts,” 

and “morale.” 

So once again, we of the Coaat Guard are glad to be here. 

We shall look forvard to beneficial changes in our “attitudinal 

framer of reference”. 

We hope that our “regression equation” canes down. 

(And I cartaialy hope that 

Thank YOU. 

. 

17 

ie a god thing to hope for!) 

i

. 

i 

. . . . . 

’ . 

. 

! --_ _ . 

I 1 . --- - . .._. _.___ __ ._ _- ____ _..._--_ - ______.. _ -.___ _-__.._ - _. -.. ..-__---.----.- _ 

.’ , 

.:. 

._ 

. . 

. 

,a’ 

i 

i 

lkmirk8 to 

Military Testing A88oclation Conference 

C- JAMES C. DONAGDEY 

Can6undlng Off tcer 

US Army Enlinted Evaluation Center 

It io with pleasure and a 6en6e of pride, 88 the C-d:Lng Off icar b 

of the US Army EnlFated Evaluation Center, that I have thir opportunity 

to present to you the asresrrment of the Army’8 syrteaa of maanrring anlirted 

proficiency. As you are the experts, I 0111 leave the technical otpOCt8 

of job proficiency testing and evaluation in your able hands and X will 

u8e the opportunity of thi6 brief prarentatioa to: 

Di8cu88 the progrsns of our Enlirted Evaluation Syotae, with p6rti- 

CUlar UaphaPir on the nev prOgrsln8 that have tripled the 8COpe of operation8 

of the Enlirted Evaluation Center since the lart Military feoting 

A88ociation Conference; 

Pre6ant rigntficant changer -da in operating procedures hara at’ 

the Center and a new method of documenting and reporting tndtividual 

proficiency; and 

To point out the team effort that ir required for the davelopmant 

of avalu6tfon material8 to rupport the objective8 of per8oWel wemant 

arli the importance of 6ttaining there ObjeCtiVe8. 

The queltty coldier ha6 been sought by our Army 8ince its earliert 

inception; although mo6t American pioneer8 Vera proficient fn the u6e of 

the musket, it VP8 not 866~ to find proficient soldie- even ia there 

&y8 when a limited knowledge of technlcal material was required. The 

quart for quality m6npwar has continued down through the yearo. Wa all 

certainly would agree that the way to proficiency h&r to be the quality 

men -- the aright t man t hfor e the rright i gjobh t t i m e . 

A6 Commander of the Enlirted Evaluation Center, my raiseion i8 to 

develop and monitor technique8 for evaluating the occupational proficiency 

of enlisted perronnel in the Army -- to maiutain a field 

administrative syetan for operation of the program -- and to render 

ccntralited 8COring and reporting of the rerults of the terto. 

Fran obrcurity to the vorld’6 second lergent testing center in 8iX 

rhort year8 16 the outstanding achievement of the US Army Enlisted 

Evaluation Center. Since March 1958, the Center ha8 developed from a 

aina+aaa operation, located in a “borrowed” ClaS6 ro~ll, to more than 

200 military and civilian parronnel (not Including the field te6t control 

officer8 throughout the vorld). 

_ _..-_ . - . . - 

F I’ . 

. 

1 , 

: * . 

1 

� 

18 

. . 

c 

: 

: 

-._-__-_- -. 

. 

. 

. 

�� 

�� 

.

‘. 

 

. 

The initial objective of the Enlisted Evaluation Center in 1358 vas 

to evalua: s enlisted personnil ln their primary MOS for award- of proficiency 

; s;. 

The succtsoful integration of this program into the Army’s systaz of 

personnel managcmcnt prompted adoption of additional programs that could 

be administered and controlled through the enlisted evaluation systezz. 

e Cur system now supporte--in addition to the proficiency pay program-the 

following functions of enlisted managcmcnt: 

. 

h 

Primary MIS Qua~ificntion. 

Qualification for Promotion. Although this is used on a pcnnissivc baais 

within the Army, the system dots furnioh the coumundcr an indication of 

the soldier who is best qualified to fill a position of responsibility at 

a higher gradc. 

Secondary and Additional MS Qualification. This function provides commanders 

and Dtpartmtnt of the Army the information required to tfftct 

broader utilization of the soldier by conridcring areas other than primary 

job in which he is qualified. It allows use of tht vholt man instead of 

part of him. 

Reserve Cocnpontnt Evaluation. This ncv program is aimed towards the tvaluation 

of our citizen soldier, the Army Reserve and the Army National Guard, 

on the same standard as the Activa Army and vi11 assure the use of the 

ssme means of measuring results toyards tha aamt goal--a better soldier 

throughout the Army structure. Thie is one more important otcp towards 

the attainment of a one Army concept. 

The Enlisted Evaluation System has answered the Army’8 need for an 

objective system of individual evaluation of enlisted personnel co support 

there progrsms. 

Prom a standing start of about 17,000 evaluations in 64 Army jobs in 

1959, the Ctnttr vi11 evaluate almost one mfll1on personnel during fiocal 

year 1965. Thfs will require publication of approzimataly 300 ttst aids 

and the development of over 1,000 evaluation tests. 

Our tests provide the commanding officer with the information with 

which to asssss the capabilities of the Individual roldicr assigntd to 

his unit. 

For example, a tank commander knows that the tanks aszigned to his 

unit have a ctrtain maximum speed, destruction capability, and crufsing 

rcngt . Likcwiet, an artillery battery commander knows that the guns 

with which he is equipped have prescribed muzzle velocities, ranges, end 

rate8 of fire. Not only must he know these specifications of his material 

but he must know the particulars of tht jobs employed in his unit to 

19 

, 

, 

.’

I’ 

I’ 

. 

-. 

_--. .- _ 

. _ . ..__. ?..-. ..___ 

. 

8ccowplieh unit lgiaaiona and the capabilitie8 of his m to perform 

these joba. 

The eoliated avaluhtion ayetesn provides the cocmendoro with 8 more 

factual basis for umting personnel mamgement decisions. 

The impact of the ~rmy’a tmlirted svalu,rtion eyotdlrn on management 

can be exprsoaed in three aroaa: Pirat, it afford8 enlletocl peruonnal 

motivat1oa for greater job m88tery, maintenance of currency in job proficiency, 

and la a morale factor oiace peroonnel d8ciaiona c:an bs baaed 

on objectively appraised writ. Second, it furniehoa tho unit comundar 

with 8n objoctivo muaure for 8ppraioing individual cunpetonco, for 

daterP1ning thoao poraonnel beat qualified for prcmotion, � �� for uoo 

la araigment 8nd utlliution of his peraonnol 8od highlighta tequitomento 

for individual 8nd unit training. Third, Depettment of the Army 

ia provided vith effective me8auroa fot implamonting personnel m8nagowont 

programs and 8pplying uniform � tand8rda in peraonnel rmnogammnt. 

Direct’aupport of theao paraqmel wnagownt functions la provided 

by the Ealiatod Evaluation Center. 

Since tho last conference, the Centet has achiovod 8 aign1ficent 

� cv8ncaneot in the method of documanting and reporting individual proficiency. 

Inotollation of tho Center’s cozuputet � yateta provided the 

capability for instituting a total redesigned evaluation report. Pl888Q 

tefet to the copy of the boos Evaluation Data Report which ha8 baon 

included in your brochure. Prir.: to canputar epplication, individual 

proficfency was reported as a single numeric rvaS.uatian 8core ma indicated 

in the upper right hand coroer of your sample report. Nw indivfdual 

reports of HO-S evalu8tion not only ahw tho axamineo’a attained evalu- 

8tion score, but 8180 his atrongtha and wcakneoaea in the functional 

8raa& of his occupational apocislty as shown in the lower portion of 

your amp10 report. Your aauple ahwa tho seven subject 8re88 on tho 

Evalu8tion Data Report for an infez:: y senior � erge8ot. XOS 115.9. A 

det8iled derctiption of those areas is given in the Evaluation Teat Aid 

for thia HOS. Sgt Baacomb lo Very Law, Lov, or Typical in a11 are88 of 

his 190s. lie and his unit would benefit from his study in 811 theao 

� �� 

Copier of this report are forvmded to tho examinee’a unit of 

8aa1gnmcnt and the individual concerned. Special added di8tt1bution 

la rude to DA, where results are used to determine various personnel 

� ctionr � 

With this Evaluation Data Report, corman dare can identify nacearary 

training requirements, on-the-job, and thoao � ubjoct-xaatter are88 in 

which form81 school training la aaaent1al. 

-_- .-.... ___-__-. 

- . --I-.---. 

__ _. -. .- -_ __- - 

. 

20 

. . -. 

. 

. 

, 

I 

--. c ---- -- ---_l-___ 

. 

I 

, 

. 

. 

. 

1__1___ 

. 

.

\I’ 

. . 

i ’ 

’ _ 

. . 

I 

1 

Z-N -A* 

. 

* 

� 

. 

__. -- - ._ _-.. - ._ . . -- -. . -_-- -_._ --. .--_ --~_. ..^ 

. . 

. . 

. 

.’ 

� 

-...-. .._---.. -._- --.--. ----.- .-.-- __ -- .- _ ..__ _ - --.. 

. 

-. 

At this point, f might add that la past MTh Conferences dlscusalona 

were hold concerning the cxpericnccs of the othl:r servlcce with proffllag 

evaluation scores. As a result of this exchange of lnfornation, the 

Ealfrted Evaluation Center va8 aided in ltr efforts to produce A Profile 

Data Report which accaepllohe8 those purpocea unique to,, the Army. 

SWry reports orb dlrtrlbuttd to -jot ccr=eandcrr in the field. 

Those report8 reflect tha’WS, tbe rpread of evaluation scores, And the 

total nunbcr achieving each @core by grade and unit. With this lnforrrmtion, 

cameender8 can ApprAt80 unit efftctlveness md emphu~lre training 

requlr-nts. 

The objective of Army Perronnel Kanageaeant IO rimply to obtain the 

uulm~m cf flclent uee of raenpwar. To oupport the att8lnment of thle 

objective through thr aunagesent programs I hava cited the evaluation 

process requires the coordinated effort of the Center, vhich is the 

oparatlog Agency of the ry8tea A group of 10 Auparvlrory and 31 question 

writing ~genclas who ftrrnlsh lndivld~al question8 And problms which 

maka up the various tests, and 81 monitoring c-ds who ~upcrvlsa and 

8UppOrt the actual operation of the tcitlng program at the many far flung 

locations throughout the world jlere soldiers are located. 

The drvelc+ent of evaluation material8 fs a team effort. Toot 

speclAlists here At the Center are in direct c~nlcatlon with thelr 

carnterparts at the question writing � �� Thir te4un affort Insures 

that profesrltmal rtmdards of test develomt ara ir,;orporeted In the 

8yStam. Through the use of Appltcstlar of measurement prioclplos by 

professioual pcrconnel the Army’8 evaluation lnstr..srsntc are in liar 

with those rtandatds employed in professional prychologlca:l maa8ur@aeent. 

Thlr later-action batween the Center And the ochoolr Is neccrsary to 

8U8tAln the highest profesrlonar. standards in our devclopnmt of telt 

Aid8 cad tests. 

Distinct advantages ACCNO frau this tt)za effort. For ermple: 

It insureo inclusion of accurate and current: subject-mrtter in the taste 

by the question vrltlng agcnclcq it provides for � ppllcatlon of momsuremant 

prlnclplc8 by the Center’s proferrlonal parronnel, and thib in turn 

Insures that the test masuras the techaicel knowledge required of tha 

roldfer to rdoquatoly perform on hi8 job. 

The military mvlronment to&y is still one of the moat caaplu 

social organizations lu the country. The problgn of finding the right 

man for the right Job 18 A continuously expanding task. 

. 

21 

� 

. : 

. 

~-- -- 

i i 

! 

i 

I 

t!

. 

. 

-- . 

� � 

.* ,_ _._._.-_ _ : ,.. : _ 

. .” 

.- 

“G, i . 

. 

. 

. 

L 

Yhir ck8rification and aroigment technique involve8 all of the 

farm of mmagesumt fmiliar to em& of you, that 18 tha identification 

of aptitudar, laarued � killa and knowledge, qualification for odvancemeat 

and relectlcan of peraonnal for opeclal � �� 

Our preuent � y8te6 ha8 conridereble merit since it is producing 

the right man for the right job. 

I believe the 8electicm nystcm, to find the right man for the 

right job, should be dafigltlve In purpo80, yet r-la adainl8tratlvsly 

r-10, and above all, arrlot the cmdar in the oelection proceru. 

You ladier and gentlawn realira the awunt of work and proferriorral 

application required to dovelop the tcating lnrtrumaatr to achieve thir 

objective. 

In my opinion, we have an excellent reearurercant devica to identify 

current lamvledga on what the men “ten do.” Rowever, soort ‘mult be 

accanpllshed m the ‘VI11 do” � ppllcatlon of the individual onca ha ha6 

been placed on the actual job. We muat not lore the confldance of 

camander8, vtroor we tell that the i?dlvidual PO job qualified, aad then 

have the mn fall mirorably when he is arrignrd to the job. 

I have examined the topico on the program for dtccurricm which I 

find nort thought provoking. I look forward to the opportunity to rtudy 

your dircurrlon group report8 end benefit from the trmrendous collective, 

technical end p~ofesriozul knowledge represented by your presence. The 

Eallrted Eveluation Center vi11 continue to rupport the Asrociatlon’~ 

effort8 toward future lmprov-tu. 

Let me say once again that the US Army, and partlcu1arl.y the 

Enlirted Eveluatioa Center, � ppreclate8 the opportunity to hart thio 

Sixth Annual &CIA Ccmferenco at Fort Beujmia Harrison. 

Ihi concluder the formal oerrion tod.ey. I vi11 be followed by 

Mr. Price, our Conference Chairmm, who ha8 8ome announcement8 to make. 

22 

---- -- ---

. , 

. . 

Perspective 

DR. HAROLD A. ED!XXION 

President, Performance Research, Inc. 

Washington, D. C. 

First of all, let me express my thanks to the speakers this afternoon. 

They have done a beautiful job of saying,in a different way, the 

same things that I want to say again tonight. 

I am listed here as the keynote speaket, and I have ‘been watching 

keynoters of the political parties this 8Umner to see just what they 

do, 80 I vi 11 know what to do tonight. First of all, th’ey seem to look 

at the glorious and constructive record of the past--what we have done 

that is notable, forward looking, and looks good on the ,record. Tha 

second step is to paint with dark colors the errors of the opposition 

and to offer many castigations for these errors. And thse third phase 

ie to point out the great and bounteous future which we offer. I 

propose to follow this as an outline this evening. It gfves us a good , 

framework within which to take a look at our own problems. I have 

chosen the term “Perspective” aa a title. I thought thins was sufficient 

to offer maybe some direction, maybe some closures. 

To keep this keynote speech and its implications clear, we need to 

be sure that we are using some of the same points of view and some of 

the same definitions of terms. Let me define the term “teat” aa ueed in 

our discussion this evening. We are a group primarily ci=rned with 

the construction of achievement tests--testa which ahov job knovledge, 

understanding and skills. A teet, as we are using the term here, is a 

sample of behavior, dravn under such condition8 that cne may judge some 

set of skills abilities, aptitudes, attitudes, or achievements on the 

basic of this sample of behavior. Testing then becomes a problem of 

constructing tests which make it possible to draw an appropriate sample 

of behavior. We are concerned with the adequacy of this sampling, and 

with its reliability. Are the sampled behaviors percfnent to our purposes? 

Then there are questions of uniformity of sampling, from time to time, 

place to place, and group to group. 

his concept of test ae a sample of behavior ia a very useful concept. 

It reduces the %.agic” aspects of tests. Nothing seems mysteriotre about 

the idea of sample. Whether you have had a course in tee:ting ata university 

or whether you have sat at the feet of one of your technicians and had long 

discourses on the nature of tests, you know that when you draw a sample, 

there are errors of sampling. You know that you have to be careful what 

you are sampling. You know that you have to draw a large enough sample of 

behavior, so that it vi.11 represent what you are tryfng LO sample. I am 

including within this concept of a test all of our procedures for drawing 

samples of behavior. I include here the sort of things we ordinarily 

23 

\ b., 

. 

.

. . 

. 

. _ . - - ._ ._ -.’ _ _ - _ _._. _..._ _---_. .__ .-. _ -__-. 

include as teste. I also include the performance of work sample type of 

teat. I include the subjective teat; and we too often look down our 

nodes at such meaaures at times, primarily because they have greater 

sampling errors under certain conditions than do other senples of behavior. 

I include an interview as a sample of beh,avior. I include ’ 

anecdotal reporto as samples of behavior. Reports by superiors on the 

performance of subordinates is a report of their sampling of the 

subordinatea’e perforaancc. Even projective tests are rsmples of behavior, 

but it takes someone well-trained to tell you vhat he is ssnpling, how 

this sample is comparable to any standards, and what to do vith it. When 

we look at testing an sampling, this point of view also subordinates the 

techniques of testing to the real role oftarta,thnt of measurement. 

Techniques may take their proper place as voym and means of achieving the 

purposes of our tests. 

. --- / 

c 

. i 

And nov to look back briefly on our achievements. Our glorious past, 

from the point of viev of this group and this meeting, is one to which I 

can and do point with pride. I know of no group that has done aa competent 

a job in achievement testing. This statement is supported by the behavioral 

facts oi the membership of this group. The membership of this group has 

advanced the otatc of the art on all fronts -- job analysts, task analysls, 

job content, frequency, and so on -- the boscs for teat ccnstruction. They 

have improved our techniques of item construction, relating content to’re.- 

. 

quirecuent , and adjusting difficulty to appropriate levels. They have advanced 

the techniques of item analysis; they have also promoted the ncc.:rsity 

for item analysis. They have developed techniques for producing alternate 

, 

forms of teata. Por theae and for their other achievements, I salute the 

members of this group, and I take pride in being here and feeling that I 

I am really one of you. 

’ C’ 

Then we come to the oppositfon, but the oppoeilton La hard to find, to 

identify. Who is our opposition? Are they the ones who refuse to use our 

products and try to make tests do thlngs that they were never intended to 

do? Arc they the ones that insist on using less adequate mean% of performance 

measurement than are readily available to them? Perhape ve are 

ourselves our own greatest opporiticn. That is a dangerous statement to 

make after pointing with such pride to the achievements of this group. But 

among us, I suspect that each of us, including your speaker, has been 

guilty of being his own greatest opposftion. When we see testing in terms 

of only one set of techniques, we are miaeing the boat. At least we are 

putting a good hole in its bottom so it can sink with us. When we fail to 

base our testing on adequate analysis of per.Cormance requirements, ve are 

damsging our program. When we fail to recognfte and state the specific 

purposes of the test in operational terms, we are running the risk of cutting 

our clvn professional throats. 

We want to know what it is we are trying to mecrure. You do one kind 

of job when you are trying to measure the status of knouledg~e. You do 

another job when you are trying to bring in skill and rcaronlng, and you 

_- 

. 

. 

24 

___L_._-.- .--.-_ -- _- _- _ _ - ._ 

. . 

. 

. . 

. 

‘, 

. 

.* 

. 

i 

.

. . 

. 

. . 

. 

need another sort of test item when you are trying to measure memory for 

obscure facts. There ar,e times wher each of these purposes is pertinent, 

but do not USC them blindly. Use them knowingly. It. brings to mind one 

definition of a gentlemen --, one who never unwittingly offends another. 

We need to know how the test vi11 be used; we need to know what kinds of 

persona will take the test: ve need .to know where and how the test takers 

learn the material covered by the test; ve need to know these as part 

of the mission of the test we are trying to build, and until we know that 

ve run a serious risk of being our own best enemy. 

Another current form of opposition, and perhaps not as real as those 

just pointed out is represented by those who seem to enjoy swinging at 

straw men. They single out some of our weaker, less adequate test items=those 

that may be viewed with alarm, with scorn, derision, double-meanlng-and 

hold up to ridicule ail testing because they found a flaw in our work. 

Some openly question the use of tests at all. I have n.>lt found out what 

they propose to substitute for testing, but I suppose there must be something 

from the “good old days” which they find adequate. Now when you 

look at the kinds of things published by our critics - a castfgation of 

tests built on some misuse of test items from such tests as the Bemreuter 

or the Bell Inventory, an attack on testing in Fortune Magazine some ten 

years ago, and, more recently, a very interesting book b:y Dr. Benesch 

Hoffmann who seems to believe that multiple-choice questions are the 

bane of our social order -- should we ignore their strident and exaggerated 

criticisms? Within each such criticism is some useful information. This 

is why I cannot label them as our most dangerous opposition. I personally 

value Dr. Hoffmann as my most competent test item critic, He accuses me 

of the worst kind of ekulduggery and of a great desire to take unfair 

advantage in wrfting test questions. So when WC read these articles and 

hear these sp,eches that take us to task for what we may consider a minor 

sin, or some error that we have outgrown --- take a second look. We may 

learn something, perhaps ,comething useful. 

One cannot fight such critics effectively because o.f the many facts 

and assumptions which must be righted. But, from them we can see places 

where we might improve: how we might build more acceptable programs, how 

we might better market our products, and SO on. 

The third phase of this talk tonight is our look ahead to see if there 

is anything in the future other thsn the manufacture of more tests just 

like the latest models we have been nr.king. Recently I visited an automotive 

museum. One thing struck me forcibly; there were cars prior to 1912 that 

looked like overgrown buggies , pcvered with washing machine motors. But 

all of those ca:s built since 1312 were not so old-fashioned. About all 

we have done with automobiles since 1912 is improve them,, We have put on 

bigger motors and better tires; FJt we have made little real change in them-a 

here in a period of 52 years no ,.&al basic change; improvements, yes; 

change, yes; but no real breakthr *ugh- 

25.

,’ 

” ._ 

, i 

. 

.- 

. 

. 

L . _ _...__ - .- . -. __ _____--, 

I think those of us ia the testing business are in that same boat -- _ 

rod to ray. The development of the format for objective tests, which we 

all use and think of quite highly was a product’of A. team like this. We 

hAve improved them, shined them up, And dressed them up in different forma; 

but We reslly haven’t mcldc A basic breakthrough in ~11 that time. 

The AmY hIphA teat used in World War I was A milestone in test developmerit. 

That test wan in objective form. It WAN certainly a useful teet,and 

it hsd in it many of the kinds of content we currently use. In fact, for 

AOCIIC purposer, it is still usable. In fact, it is set11 on the rmrket. But 

we build testr for many other purposes. We have used improvements of the 

techniquea, but we have yet to realize A real breakthrough. We write items 

with one, two, three, four, five,and six AltematiVcU; one of them is right: 

the re8t of them art wrong. They did not have at that time the refinements 

we have developed in item analysis, using biserial coefficients, or whatever 

other statistics you May use at your technical altar. We did not have factor 

AMly8i 8, AE A matter of hiStOriCA fact, psychometrics did not reelly 

get going until the days of Cm1 SpearmAn, followed later by Thurstone. 

The idea of item homogeneity is still A more modem concept. But I don’t 

consider there as breakthroughs, They Are improvements in juet the same 

way that bAllOn Circa were AU improvement over the old high-pressure hard 

tire8, or in the OMIC way some of our modern carburetors are improvement 

over the simple carburetors of the pest. 

. Despite all this, however, I Am optimistic about the real possibility 

thdt we are on the eve of some major breAkthroughr in testing. What they 

are, I do not know. I am equipped with neither that kind of vision nor 

wisdom. By A breAkthrough, I mean a change or new idea that alters the 

whole perspective of test c.onrtruction as well as test usage. Are there 

AituAtions and times when another technique would do better; for exttmple, 

better As the first step in determining what factors in A job are really 

crusial, or really pertinent? Instead of trying to cover the entire job, 

maybe there are uome element8 in the job that ere really crucial. There 

. were some early flags of that kind back in the 30’s in the development of 

job questions by the U.S. Employment Service. It was Asked that these 

test quettions discriminate batwern the master or journeyman, and helper. 

� Their particular purpose in the employment office uas to identify the men 

who would like to upgrade themselves, but who really did not have the kncr:- 

/ 

� 

Adge of the mater craftsmen they had worked around. But many had worked 

enough so that they had learned something of the job. RAybe there are 

Aome clues in using the concept “How do you know?” ‘What did he do?” kind 

of questioning. Only once in my career have I velidatled a set of selection 

tests All of which measured up to the criterion of performance. Rather 

than using ratings of competencgwe tried a kind of job analysis attempting 

to set what aupervisorr saw when they rated A man’s performance es “good.” 

We asked “Who ie your best man? Whet does he do or fail to do by which you 

know that he is your top worker?” We would continue t4e conversation about 

one man for 30 to 60 minutes finding out whet he did, boo he worked, how 

� 

26 

. . . -

.___ ---.-.-a ---._- ~.-~“..CI.~.--..w.‘r -.--.-.-^_. -. ._ . 

.’ 

! 

) 

he handled his contacts, how he cooperated vith the home office, everything 

we could about him. Then WC would ask, “gow about one of your 

poor ones? Who is.the next one you would drop if you had to cut iour 

work force? Haw do you know he is that bad?” We found enormous overlap 

in observable performance of the better as compared with the poorer men, 

I 

a 

’ 

and how he worked, what he did and vhet he failed to do. but there wds 

a small area of difference. It did not tell us too much about job duties, 

but it certainly told us hw the company valued job-parform=ca.. And 

this vas the information that gave us the clues 8S to vhat tests to 

select. Maybe thin point of view contains 8n idea we, should investigate. 

Instead of looking at all the duties, functions, and rcsponsibflitiea of 

the men, try to leern in what wrys we can spot a good one when we see one. 

Let us not forget that there is a new set of tools at our disposal; 

they are known as computers. Computers and their accessoF equipment 

certainly ought to open nev d-ors to us. They have already opened many 

doors. The biggest and most important is thet they permit us to do jobs 

that were too big to do before. They handle much more data. Before 

computers, how big a job of item analysis did you tackle? One hundred 

high cases; one hundred low? That would be close to the limit of what 

you would try to do. With comp::ters ve don’t think anything of using 

revere1 thousand cases in our item analysis. We are tackling problem? 

that are bigger erd bigger. Some of you may remember the day, it is before 

my time I mfl;ht add, when six correlation coefficients would enable 

one to e8rn a I% ;i. degree. As time vent on more correlations and more 

complex correlat;.)n functions vere required. We are collecting and using 

bigger and bigger quantities’of data in support of our efforts. These 

are improvements, but not breekthrougha, because we have been using the 

s8me old techniques, but with increased effectiveness, Computers do uuggest 

that ve can do some things in ways never done before. For example: One 

group of people for vhcxu I heve constructed tests .-or some yeers are so 

bright I 8m not able to write items tough enough for them. These are the 

science talent perticipants for the Westinghouee Scholare;.ips; these boys 

and girls are really bright. Nevertheless, I’m still looking for talent 

even in that group. I want a test vhich discriminates particularly among 

those in the upper half of the contestants. I want a test which will be 

reflected by a frequency distribution that has a positive skew. Some test 

questions which come within the proper difficulty level may depend upon 

remote and unimportant bits of information. Is it, for example, really 

important to knov precisely hw many light years Arcturus is from the 

Earth? Or even to know the answer within a thousand light years? This 

fs the kfnd of question that is answered correctly by few people, but etfll 

did not seem to be just what the situation demanded. Then ve tried multiple 

choice questions; but instead of having just one right and 4 wrong answers, 

we thought we would have one, two, three, four, or even five right ansvers. 

A score of 1 is earned if all the alternatives are marked correctly, and 

if there are ;qy errors the earned score is zero, This produced test 

questions of appropriate difficulty and gave the right kind of frequency 

. 

. 

27 i 

-._ ._ 

, 

. 

, 3 t, 

^> 

. 

. , 

I 

i 

1 

I’ 

I 

/

, . 

distribution. Basically, I have used the simple concept of conpoutd 

, 

probability, 

. 

Psychologists and test makers seem to have assumed that all that 

one needs to measure, or perhaps can measure, are behavior samples drawn 

from individuals. have you ever discussed the validity of a test 

question with your field people and had the answer, ‘Well that depends.” 

That depends on circumstances of the job, of the man’s boss, of the 

working climate available, of the standards to which he is held. 

* 

Another area that we should explore is this matter of fncludtng in 

our factors of performance some which are outside the skin of the individual, 

but nonthelaar lnflu5aco the quality of hia pstformance and of hlr knowledge. 

These pieces of information should appear on the right-hand side 

of the equation; not on the criterion side, We have not explored this. 

Since the daya of Carl Pearson we have bowed down in front of linear 

measurement. tven out factor analysis is just a complex linear system. We 

have not broken with that tradition; we have not developed the tools to do 

it. Did you ever try to compute a correlation matrix using nOnlimdr 

regression lines? The first thing one does is rectify them to get back to 

linearity. You are well aware that there is ao much in human performance’ 

a.rd human value that is nonlinear. Even starting from mince pies, there 

15 such a thing as too little and there is such a thing aa too much. In 

between them there f5 some kind of optimal amount. With most of our personality 

measures we find the same sort of thing, Perhaps we call them bi-polar, 

but we ought to look at these a, porentially nonlinear concepts. 

Then we talk about patterns of performance. Wfthfn a set of questions, 

suppose there are three for which the answer “yes” are "out." We might call 

these “lethals,” Conditional answers also might be explored. If you mark 

answer number 3 on question 1, then whatever you answer to questions 13 and 

26 is wrong because your prerequisite knowledge was inadequate. There are 

a number of these pattern concepts that might be useful and the method tested. 

Out computers open the door to such problems because this kind of exploration 

call5 for considerable amounts of data. And I might add, your organizations 

have sufficient data. You have the opportunity of exploring rationales for 

weighting and patterning of answers. 

Some of the possible questions are, How good is it for a man in a 

given HOS to be a member of the 1 percent who give the correct answer to 

quertion 471 Is thic worth more than 1 point relative to the other 

questfonr? One might explore the weighting of item5 inversely proportional 

to frequency of right anBver5, so that those who have the mote unique 

knowledge become more vieiSle. 

28 

. . 

. 

* . 

- 

. 

*

. . 

- 

. -. . 

. ‘. 

- 

-m.__ ’ 

We need careful definition of what we w4nt aach texst to do. we 

IWad ~108tar 6tbt6mant6 of what we want it to ~1~1oura and the condition8 

under aiich it owe do ito job. For a given aiammxwnt task w4 may nsed 

4 toot which yield8 89% perfect @core4 , or one that nsbodp parscoo ualeoo 

ha nsakeo a perfect ocara,aud there ara time0 when thi4 is 4 reoooneble 

toquirement. What kiod of diocriaination doe6 your test need to aukt? 

Doe4 e4ch queotion naud to opcrata at tha one parcent level? Are you 

aokfng it tcs diocrFrninate in ouaa othar fashion? 50 you want the tMt 

to be uoed a8 a motivation:1 device -- to moko your teat takoro uu;iouo 

to get out 4nd do more, better, 4nd Cantor? Do you want 4 toot that is 

basically a teaching dcvica? That wa4 the initial motivr~tion of S. I.. 

Preooey’o teaching machine. In uoing hio mochineo,ovory time o otudant 

puohod 4n answer button to uIewer the tart 4nd got the right 4n4wor, the 

-chine went on to tha noxt quartion. If he got 4 wrong rnswar he kept 

on prrooing anover buttono until he got tbo right enover,, How vc erll 

thio reinforcement; imediaco feed bock. 

Thare aro ocane othar quaotiono to which pa Q not h&ve uuwcr~. ilou 

volid ohould 4 toot be? We have ha4rd about maximum validity, about lack 

of validity; but molly wh4t io the optim41 v4lidity YU rlhould look for? 

Anothsr quartion io *How rapidily and under whet cmditicmn do tsot ocoreo 

loose thair validity?” I have conoidorad the po4slbility of submitting a 

rootirch propooal for the development of tcot record inb: -0 ink4 for cuking 

teot recordo -- 80 that the teat � �� for each teat wou1.d be faded out Juot 

at the tlme tha tart 4cora had lort it4 validity. At me time in 4 cwanooling 

center, wo act up a hard and foot yle, “Any percouality, intorest, or 

oimil4r tert has lort it4 validity 4ftcr 3 months.” After that,o5t4in 4 

freoh ocore - 4 04mpla of whrt that individual lo like now. And it might 

be noted that people oeeking counoelfag are parh:?o leoot stable in ouch 

charoctariotico. For b4ric rptituda teat8 ouch a# tha Ohio Stats Poychological 

Tort, we aokod for 4 frarh tart or rumple after one year. We nacd 

more knowledge 4bout thfr queetion not only in toner of tho teat, but 4140 

the intervening aventr. In measuring how wall 4n electronic6 technician 

parform on tho Job you msy get 4 meaouramnt at the time he completer 

bio training. Suppore he had �� it and wait 60 dayo for �� rafgnmont; 

hw volid lo hfo ta4t bcore after 60 d4yr with no practfce in the electronico 

ohopt Hw valid would hi8 t44t ocoro bat 

In being here tonight, I have triad to point to 4om3 of the future 

and to � tir up your thinking. I hove tried to illurtrata the l’taot’@ 40 4 

sample of behavior. I w4nt to cheer for axtending 4nd trying out new concapto 

of 4n8wer pattarnr in tart fonnulatfon. I want to iace ue devalop the 

ure of computer6 to do things in test rcoring and teat analyrfa th4t wo 

hove only tolked or drarsled 4bout and to do � ome thing4 that wa h4ve not 

avan rtartad to inugine. Keep your w4ys and maano, your rechniques, in 

proper peropectiva; and keep in front of you a clsrr idea of the purport4 

of the test itself by defining it4 uoeo, 48 well as ftr limft4tion4, 4nd; 

through these, extending Its ueefulneoo a8 a me4euring device or w4y of 

drawing more 4dequ4te, 4nd more effective samples of behavior. 

I . 

I 

29 

_ . _ __ .._^_ - .._-_ . -. - 

- 

._. ,. . 

. 

: 

I 

. 

. 

*.- ._ ,( 

\~ 

._,. _ _ .____. ._-. .- ___--__ 

* 

--.- 

,I( 

- -- 

, 

-_-

-. 

-. _-.. 

.- . 

- ! e 

I 

_-.. . 

1 

. 

.- 

. 

2 

-.e. 

T-& 

‘. . 

. . . 

, 

. 

. . - 

Approschcr to Improved Haaaurcment 

CIAUDE F. BRIDCBS, Chairman 

US Army Enlisted Evaluation Center 

Xn response to the consensus of the recomendatione of the participating 

agencies, the conference theme is %xraaeSng Measuring Efficiency 

of Evaluation Inatmentr.” This series of theoretical symposia is designed 

to explore ways of obtaining marked increase in the percentage of 

overlap between job proficiency factors and the evaluations of the current 

job proficiency of enlisted personnel. Special attention to innovttions 

and possible brenkthroughs in wtyt of increasing the covarlance between 

achievemant and measuring instruments Is long overdue. Concerted, lnttnsive 

efforts will be ntcearary if we are to make rignificant inroads 

into uhrt Captain Hayes aptly referred to yesterday aa “the outer apace 

of terting.tV Mere t’poliehing’8 activities using usual test validation 

and item analyoie procedures obviously art not enough. 

Since World War I, such polfshing activities have indeed improved 

tests but, as Dr. Edgerton so clearly Pointad out in his keynote address, 

have not made genuine changes. Perhaps the met continuously extensive 

research in the intervening 45 or so years has been pointed towards improving 

the effectiveness with which academic aptitude tests predict 

success in school. However,afttr all these efforts, what it the ututl 

correlaiion between ouch teatr and school marks? Dr. Chester Harris in 

the Encyclopedia of ?ducational Research states that the correlation 

between “intelligzce” tests and scholastic achleveoent typically fallr 

within the range .40 and .50. The validation studies canplated by the 

Evaluation and Analysis Branch of the US Army Enlisted Svaluation Center 

indicate that met of our recent Enlisted HOS Evaluation Tests ‘have 

validities in this same range. Cur attempts to adapt the most crucial 

aspects of the usual @‘custom-made” test developmeat procedures to a 

high-apted closaly-timed test production-line and to apply previous and 

current research findings in the measurement area to our test dlevelopxrnt 

8ctivities during the last six years have enabled us to improve or 

polish t our tests o a level approximately comparable with the typical 

academic aptitude tests, 

This looks fine. But let us consider -&at the correlation of .50 

actually mean2 in terms of the percentage of variance in achievement 

typically being predicted after 45 yetrn. Squaring .50, the upper end 

of the range, we get only n 25X overlap between the two varitbleo. 

Figure 1 presents a graphic picture of this sltuatioa. Seventy-five 

percent of the variance NOT touchad yet! 

30

w - ------- --- -.-. ___..- - I _ .__-.. .- _._.. . 

. 

Combiaetioar of two typical evaluation instruments gf.ve slightly 

better rasultr. The recent validity studies indicated that thin year 

RRC can expect correletionr in the neighborhood of those s:hown in Figure 

2. The Rnlirted HCS Evaluation Tear combined with ratings on the 

Vxxmnander’r Rvaluation Report” for mOst military occupational specialtier 

should yield multiple correlation (R) with peer ratings of about .57. 

(Peer rating8 may not be the most valid pOsBible crittrfon of all erpects 

of over-all job proficiency. Hovever, there is considerable eatiefectory 

evidence that, when obtained frcaa raters who know they are for experiment81 

1lp9 & peer zngs can provide practical and useful appxeele of a 

� ignificent portion of complete job proficiency,) We are working especially 

to improve the teBts for the “problem” XCS and expect even higher veluee 

next year. HoYtvet, the improvements still will leave not tapped about 

two-thirds of the factors involved in differentiating between different 

levslrr of job proficiency. Continued pollehing will help decrease this 

unmeasured variance eomevhat, but the ueuel polishing techniquer or even 

the addition of other carxnonly ured types of measuring inrtrumentz, 

conwnonly raise a multiple correlation only a few hundredths of a point 

at the mat. 

Indeed we still do have a very long vay to go. Today YB will attempt 

to survey some of the possible routes to be followed. Rowever, the best 

modes of transportation over them will remain to be determined. The 

general territorial area.8 which the route8 muat cover are job factors and 

personnel factoro--i.e., analyees of the characteristics of the job end 

analyses of persons performing it well vereue thore performing Ierr veil. 

Probably the most ixnediately fruitful of the two general areas of 

investigetlon involve8 the development of ways to Improve the determination 

of the factors in the job that dircriminata between indiv!lduels with varying 

levels of job proficiency, Recaure of this, and of the intorest camnon to 

El1 the agencfea, a special tterion on “Job Analysis for Teat Development 

Purpoata” hia been mt up. The rterion on l’Nonstetistica:l Criteria for 

Evaluating Iteaur” likewise Is pertinent to the area of job factorr. 

The other general area of invertigetion involves new and better ways 

of measuring personal factor-r of ptrronntl end of relating them 

appropriately to the job factora. The major typo8 of personal characteristicr 

for which appropriate mtesuret are nstdsd for verioua jobo might be 

rtructurtd AB follows: 

1. Job Knowledge (texonwltr of job content), 

2. tlentel Skills (taxonomitr of mental functione--mental manipulatione-required 

by the job). 

3. Xotor Skills (phyrical manipulation). 

31 . 

--.

-. 

‘. . 

_.” 

1. 

2” 

-- 

.,; 

. 

. -: 

.- 

.z 

/ 

. -. 

.A . 

. 

._ 

/ 

, 

,t 

I 

! 

I 

. 1. 

. * 

.;’ 

/ . . 

-e__.:_I_L---.. . ._ --. -__L.- -- .YU r--- -.,-., __.. ._. _._. - -. 

4. Interpersonal rtlationrhipc skills (social -niPulationo) . 

5. Job motivation (application and effort). 

6. Test taking motivation, 

. 

- 

� � � � � 

. 

7. Interaction of personal characteristics with situatianal 

characteristics (Effectiveness of leadership provided;, type of 

management; characteristics of subordinates, peers, supervfaorr, 

administrators, locale, spatial conditions, etc. Why does the 

oamt man perform very well in some places and not in others?) 

8. Other noncognitive personal characteristics (attitudes, interests, 

level of aspiration, initiative, drive, energy, perservtrance, 

etc.). 

In the relatively ahort tims available during this conference, it 

will be tiposeible to effectively consider all of the major kiudr of 

approaches to improved mtaauremtnt of the varioua types of ,ptrsonal factorc. 

In order to auggtat others for later consideration, atveral potentially 

fruitful approaches omitted in these aaninars are mentioned. We will not 

discuss the atatiatical treatment of individual response data, such as 

beta weighted correction for errora, analysts of response p.atttrns, and 

scaling. Another omittad but potentially useful area would be the � nalpsta 

of officinl personnel record8 (use of biographical data, aptitude testa, 

education, age, years in aarvict, training records, honors, edvencement 

rate, etc.). 

We vi.11 be able to explore only about one idea for the other possible 

approaches. For example, ve can consider the utility of only the 

readability aspect of the ccnmnunication problem in teats, omitting semantics, 

use of illustrations, granxaatical conrtruction of the stem, etc. Many oE 

the omitted ideas involve primarily test polishing activities, but some 

of them ahould be remarkably effective in decreasing the unmeasured variance 

for a few military occupational apecialtiea. Cronbach, in his Essential6 of 

Psychological Testing, 1960, page 331, cite8 the result8 of an unpublished 

study by the Training Aida Section, Ninth Naval District, Headquarters, 

Great Lakes, Illinola, 1945 entitled “A Comparative Study of Verbalized 

and Projected Pictorial Tests in Gunnery.” He rmriaod the result8 61 

follows: 

“Training of Navy gunnrrr had been validly evaluated by 

acorta made in operating the guaa. As an economical aubatitute, 

verbal and pictorial tests were developed, Identical information 

vaa teated in two forma, the same quaation being asked in 

vorda alone or by mean8 of picture8 supplemented by words. 

Questions dealt with part8 of tha gun, duties of the crew, 

appearance of tracera when the gona was properly aimed, etc. 

-- 

32 

. ‘. 

. . 

a 

� 

;

. 

. . , - . 

. 

- 

. 

2 

. . . 

.--...-. . _---- .-- --__ L.--i- . .._._” . ..- _ . .._I. __ .I__^ ---._..-___ ._.-.--.--^ . . _. - --.. .- -, 

.- 

-. __...._e._.--.I_ ,i 

The pictorial test had a correlatfon of .90 vith instructore’ marks 

based on gun operation wherea the valfdfty of the verbal test was 

only .62. The verbal test war in large measure a reading tes‘t; it. 

correlated .59 with a Navy reading test, while the picture test 

correlated only .26 with reading.” 

After these introductory remarks were prepared, the October 1964 

i88Ue of the American Psychologist arrived with an outstanding presentation 

of concept8 closely paralleling some of those we have been considering. 

In fact,eurpriaingly enough he used a quadrant idea to represent variance 

in ,comon, soqedat like this hart (F’fgure 2) that had been prepared to - 

-depict clearly the magnitude of O’ar challange. There is no better way to 

close theae introductory problem defining remarks thsn to quote the opening 

sentence in Dr. Melvin R. Harkr’ article entitled “How tab Build Better 

Theories, Test8 and Therapiecr.*’ Dr. Hark8 aptly stated: 

‘@The theai of this paper ia that psychological researcher8 

too frequently define their problem8 in ways which intrinsically 

. . preclude solution; that they spend their time in Bflding lilies 

of dubious quality rather than in determining why or how’ 

.- promieing new blooms were blighted; further, that this 

.deplorable state is characteristic of theories, therapies, 

a n d tests.” 

. .-.-.....-... .-._-__,- __.I____...I____ ---- - 

. 

* 

. 

, 

. ,: 

_^._ ._-__ _ .____ _I___ -__. _ . I.._. - ._..... _I. -- 

. 

. 

. 

,/ 

$’ 

1 iI 

tI 

i 

I 

3. : 

I i 

-

i 

ACHIEVEMENT i 

t 

UNMEASURED h 

TEST 

5 Y =.50 

r2 % Y =.25 

Figure 1. Percerttage of Achievement Measured by Test Hoving 

o Validity Coefficient of.50 

I 

67.4 9% . 

UNMEASURED 

L 

I 

&. - - - - - - w - m - l 

RATING SCALES - - 

rx2 Y 

P x2 y 

(x2) 

= .45 

= .2025 

P2 r = .I632 

- - - - - - - - - 

1 

TEST 

f 

(Xl) 

I 

I 

rx, Y 

r2x, Y 

PI r = .1632 I 

rvP 

= .45 

= .2025 ; 

= .20 

r2 Xl x2 = .04 

Ry*x,x2= .57 

R2 = .3264 

Figure 2. Percentage of Job Proficiency Meosurcd by 

1 

Combining Jab Mastery Test and Ratings. 

i 

I

c. 

/ 

I ’ 

. 

a 

. 

c 

. 

’ , 

Summary Comparison of Purposes and Programs d * i 

: 

of the Ullitary Services 

zlr. Chairman, Rellw Conferees, 

E. c. JoHHSo1’I 

US ?rmy Enlisted Evaluation Center ) 

As we begin to explore possible approaches to improved measurement, 

it will be well to review briefly, as a point of departure, the programs 

that arc nw in operation within the military services. You each have a 

detailed s-ry prepared by each service on their evaluation programs. 

Yn this paper I will present a Summary Comparison of the I?urposes and 

Programs of the Hilltary Servicer. I have prepared a char: that depicts 

thi 8 canparison. his chart is listed as Appendix 1 at the end of this 

paper. All the military services use a paper-and-pencil multiple choice 

type test as the basic inrtrument of their evaluation progrsm. This 

te8t is supplemented in some cases by a performance test. These tests 

mre very similar, vith the exception of the Marine Corps, in that they 

are designed to mcamre the job knwledga required for satisfactory performance 

in the militury specialty that IS being evaluated. Each military 

mpecialty has a separate test for each subdivision of the specialty. 

These subdivisions are either pay grades or skill levels. The number 

of questiono in these tests varies from 65 to 150. he NC0 tests include 

coverage of general military and f.lpervisory abilities as vell as the job 

specialty. 

.ARmPRoGRAH 

The Army uses two inotruments in its evaluation progrsm. An M0.S 

evaluation test for each skill level and a rating form, Couxnanders 

Evaluation Report (CER). The CER Is a rating form ueed to evaluate a 

noldier’s performance characteristics in a specific HOS at an ertablished 

level of skfll. One rating is accanplished by the soldier’r immediate 

superfor and another by the immediate superior of the rater, The CER’s 

include scales for rating the individual’s cooperativeness, reliability, 

job performance, and other factors. 

After the tests are administered Army wide, the MOS evaluation test 

unswer cards and the canpleted CER’s are forwarded to the USAEEC. The 

two Instruments are scored at the Center by canputer and aI canposite 

evaluation score is obtained. An HOS Evaluation Data Report is prepared 

for each individual teuted. This report contains the individual’s 

Evaluation Score and a profile showing his standing in each subjectmatter 

area of the test. The unit coaxmander receives a copy of the 

individual’s EDR which he reviews and then forwards to the individual 

soldier. Thus the individual and his unit commander is made aware of 

the subject-matter areas’ in which the soldier stands “high” aa well as 

those areas in which he needs to improve. 

. 

. 

. 

. . 

._ .-- --.. - _--.__ _ ___ _ _-_ -___ -.-__,._ XI ..__w_._. ---I_-- --.. _. 

_ _--__ -_ -. ---. - -- 

. 

i 

: 

i

. 

, -. 

.._.. 

.- 

---- _..-.__ 1-1 .-.__ - ___... _ 

Thir FSX Evaluation Score and the Profile Data Report i-8 then ured 

a8 a baris to: 

Source of Itmu 

1. Avard Pro Pay 

2 . nos Verification 

(Primary and Secondary HOS) 

3. Ronmtiou Qualification Score 

(U8ed by camanders on m optional bario) 

4. Pay Grade end MOS Determination 

(Officer reverting to EK) 

5. Identify Training Reeds 

(Both for the individual end the unit) 

The Army depend8 upon Army service school8 and other inrtallatioas 

to kite tbair te8t itaac. Host of the tlmo the item writer8 are lnrtructorr 

in the MO.5 qualifying coume. They are civilianr, enlirted men, or 

officer8 who are arrfgned thir duty in addition to their regular inetructoz 

re8ponribilitie8. 

The Nuvy u8a8 a 8sparrte examination for each petty officer pay 

grade of all Navy rating8 with a few exctptionr. That exuainationr are 

� dmlnlettrtd by t.xrmin?.ng boards vhtttvar naval parronntl are ttationtd. 

The completed tn8vtr card8 ure scored by the Naval Exuainlng Center. 

Pars/fail cattgorltr art 8et up for each exmnination bared upon a multitude 

of factor8 much as netdr of the rtrvict at large, budgetary problase, and 

above all, the dtgrtt of qualification and performance of each candidate. 

Each candidate ha8 to par8 the exmainatlon before the other factor8 

� re conridered. Final � dvancanant, � sauming the candidate has fulfilled 

all prtrrqui8itt rtquirarentr and has attalned a par*ing #core on the 

examinatfon, 18 made on the baair of relative rtanding on a final cwpoaitt 

acore. Thir canposits score includes there five factors with cueximut~ 

valutr U8 follour: 

.-.._ 

. 

. . 

. 

. 

_. . _ ., _- .._ _ _ - . 

. 

. 

36

-- 

/-’ 

1. Bxamfnatfoti Grade 

2. Performance Factor 

3. Length of Strvicc 

4, Time in Rate 

5. Naber of Awards 

Maxiaum Composite Score 

If the candidate ccnzpcter In an occupation where there am more 

vacancies than qualified ptrsonncl to fill them,then the exanination 

acore qualifies him for advancement. 

This is the procedure used for advancement to pay grader E-4 

through E-7 to fill Navy wide vacanqies. Selection of personnel for 

pay grade8 E-8 and E-9 to accaaplfrhed by a selection board convened 

in the Navy Department, vhich rtviewr exuninatfon result@ and fndividual 

record*. 

Source of Items 

The Navy’s item writers are aseigned to the Naval1 Examining Center 

for a regular tour of duty. 

TEE AIR FORCE PRCKZUX 

The Air Forct uses three rpccialty knowledge tests for each Air 

Force Specialty. They are 3 (#emi-skilled), 5 (skilled), and 7 (advanced) 

levelr. There epecialty knovledge terto are adminirtared to Air Force 

ptrronnel world-wide. After the 5 and 7 level tests have been given, the 

teat ansver cards are sent to the Personnel Research lAboratory where 

they are rcored. The paes/fail rates for these test8 are determfntd 

in accordance wfth criteria ertablinhed by Htadquarterr, US Air Force. 

Scores on rpecialty knowltdge ttata are reported In pcrctntile form and 

do not indicate the actual nunber of questions answered correctly. This 

score ie used to identify those airmen who posrerr sufficient knowledge 

about their jobs to be considered for up-grading to a hlghtr level job. 

Thie ecore ir then used vith &her factors to areers the overall compettnce 

of airmen for the avard of an Air Force Specialty. 

Source of Items 

The Air Force pl;:er ftr subject matter apecialfets on TD’f to the 

Personnel Research Isboratory. These subject matter specialists are 

senior NCO’s in tht Air Force Specialty. for which tests are being developed. 

They develop all the test items for the three tests in their specialty 

while they are on this TDY. 

37

’ 

;; 

. ‘-_ 

. w 

__ .,-_. . __-__._~.---~~x~~~~~c~ ---- - --_.._ . ._ - __ --.- --.. --- 

Tbb c’c -t @ad’s progrrs barically porollelr that ‘of t’he Navy 

80 I will not go into their program in detail. 

\ 

Tffe KARXNE CORPS PROCRAH 

The Marine Cor~8 utiliner revere1 test lnrtnments in their evelu8tion 

program. 

The tart that ir 8-t caqmr8bla in u8e to the evaluation te8t8 

urbd by the other rbrvicb8 18 their Caneral t4ilitary Subject Tart (c;MsT) 

Thiv telt 18 ured to determine the eligibility of enllrted men for promotton. 

It 18 adminirttred to eligible corporal8 through rtaff rerguntr. 

(E-4 through E-6). It differ8 frau the other tt8ts dl8&88ed in that it 

doe8 not cover the militcry rpeciolty or occupation of the individual 

but h88 a broad rcopt ccncanpa881ng thO8e rubjectr considered inderpenreble 

for Mariner in a canbat area. Tha8e rubjectr include tactical subject8 at 

8qu&d and platoon level, equipment and uniforml, field 8anitAtiOn, fir8t 

aid, etc. Thin tt8t 18 given three time8 trch yur, the Mower card8 art 

8ent to the Marine Corpr Inrtitute where they are gmded, and the rerulto 

are foxwarded to Promotion Branch, He8.tquarter8, Marine Corpr end the unit 

cmnranderr . 

A Urine rm8t pa88 tha CZGT before he i8 conridered for prarotim. 

The other factor conridered ir hi8 promotion 18 hi8 proficiency in ha8 

technical 8QOCiUltv. Thi8 IO determined by UImU8 of fitness report8 rubmitted 

by hi8 immmdiate ruperior over a period of year8. 

A8 8 matter of gtntrwl interbrt thb HArime cOrQ8 h88 threa other 

tbltl. 

1. The Geucrql Military Subject Proficiency Evaluation Tert j@lSPET)- 

Thi8 tart cover8 genernl military rubjtctr in more detail than the 

Q4ST and ir urbd by the unit commander to identify the training need8 

tnd to judge thb bffeCtiVtie88 of hi8 training progrun. It i8 provided 

to thb wilt c- dsr by the Marine Corpr Inotitute. It ir provided 

8ohly for the locel caruaander. No report m tb8t rb8Ult8 arb made to 

highbr hudqtmrtbro. 

2. Inrpector General Te8t (IGT)- 

Thir tart 18 an bxtract of the GKSPET with limited rrrnpling. It 

i8 deeigned for ~80 by the XC on hi8 Mnual in8pbCtiOII. here tart8 are 

‘gradad by the Marine Corps Inrtitutb tnd rtrultr arb nuilbd to thb XC 

and unit in8QtCtbd. The r"QOrt rbflbCt8 Unit QWfOlDWCb and CCmtQarb8 

the unit with other typo unit8 end thb Marine Corpr norm. 

38 

-- _-__. . -- _.._._- --- - - --- _.._...- -_- _--. - .- 

. 

. . 

. 

, 

-. 

c 

. 

. 

. 

. 

: 

- _.. . _ . . . r . ( .. -'.. . - -. * _ .._m __ ,__ __._ _ -.--- ---- _.- - . -I .-- 

. 

. 

.- 

.

: . 

A 

. 

. 

. 

3. Officarr’ Administrmtivs Subjects ExamlnatFon (QASI?~ 

The purpose is tolwtivaCe avery company grade officer to faxitfarirrtlm 

with a varfrty of edxafnirtrativo typo eubfectt. Every officw 

muat t&a thm aroadnctfon, re a Lieutenant and again ae (II Captain. 

Soccurful cafspleticm of ME is not adatory for promotion but arch 

officrr baa to continue to take the cxmdnatla tech peer until ha 

ruccesrfi~lly parrrr it. The test itsar are eelected by CL board of fiald 

grade officero and acre&led into a te8t at WI. 3-m temt8 are grsded 

by m and 8 r8port 18 8ubmitted to\XArine Carp He8dqUarter8 which in 

turn publirhrr the rerults and direct8 an rpproprints entry in th8 

OffiC8r'8 jACk8t. 

lor thir 8~mtp08i.um, tvo pertinent conclu8ioa8 cw b8 mad8 -- 1. 

Althargh th8re are rlgalficant difference8 in th888 progr8m8, th8r8 

oboiour~y 16 con8td8rable CanauMlfty fa the U8es of th8 tcrto. 2. 

Purth8nmre, th8 baelc goal of th8 programs of all tbs rervfcco c8n b8 

srpt8408d, in a gtnrr81 v8y, ~8 being ured t0 provld8 mU8ure8 Of thr 

extent to uhich wincer have nmtsrad the rignific8nt aEpact8 of 

thrir milltorp 8peCi&lty. 

! : 39

I 

.a , 

I 

’ ! 

; * . 

. 

. . . 

. 

, 

Am!t- 

NAYY 

AIR FOt Cl! 

COAST GJAxD 

zi 

TEST 

1 

Mm!& 

‘IEST 

. 

. * . 

AWARD PRO-PAY 

WOS YERIP ICATIOW 

FFuxYrIm QuALxFICATmN !SJBJECT-tL4TTKR SPZCIALISTS 

EVALWTICH SCORE SC~RR AT SERVW? SCHoOts AND 

PAY GRADE 6 tfOS 

Dl!mmma’L?rn 

IDEHTXFY TRAIMING NEEDS 

CYI’fiER AREt INSTALUTICNS 

PEW-E FACTOR 

SUBJECT-IUTI’RR SPECIALISTS 

x.mcTx OF SERYICL PRm(xp Assn?zED To FUVAL 

TIXKINRATE 

NUMKR OF AWARDSIIt 

EXAtUMI!G CFXTER 

TRST 

UI’RER FAcfCRS 

TZST 

PERF-CE PACKtR 

mm OF SKRYfcE 

-liiEZYXEARD,J 

SUBJECT-HATTKR SPECIALISTS 

AWARD OF M sPEcIALxlp TDY TO 6570th WRSmL 

RKSEARca IABCRATORY 

PRmoN SzmcT-#hTrRR SPECIALX~S 

PIMXWTIOR @MST) 

TEST IDEmIEY TRAINING 

NEEDS (CXSPE’T) SuBrn-PATrKR sPEcr.hLxsTs 

CYl’ER FACTORS 

nvsPEcTxoN (ET) 

tiOTIVATIfXl (OASR) 

. . . c 

.\ -_-_-- 

, . 

. * 

I 

. . 

I

. 

. 

. 

Conctpta for Exploration in 

Prof iciepcy tieaaureraent 

EfICHAEL A. ZACCARIA 

Lackland Air Force Base, Texas 

XARVIN KAS’ 

Randolph Air Force Base, Texan 

Roth the keynote speaker and the cbairam of the tboretical aympoaim 

on approaches to improve measuremat have asked that ve develop C~ccpte 

for exploration in proficiency mBaaurcment. Our theoretical sp~~osi~m 

chairman baa indicated that vc can gain very little by merely polishing 

up our present meaauremnt techniques. Thus, any approach to a real 

improvement in proficiency measureuxmt muat be a radical departure, at 

leaat in aow vaya to the conventional concepts presently applied in 

this area. The traditional appmach has atereotyped our thlnklng to such 

I an extent that ve are primarily concerned vith marely ru Lning and velidating 

proficiency measurement devices. It might be vell for ua to bc:!r. 

this discussion by reappraising aoaz of the concept8 that have ‘

..- . 

-*- .,- 

‘. _ 

. 

When the matn objective is to use a umaoure to select or screen 

individual8 for a .program, there ia no question concerning the value 

of a norm-referenced syatcm. On tha other hand, the question arises 

ar to whether we should group achievement measures in order to con!4 up 

with one score. 

()ur program chairwa has stated quite tuccinctly that in order to 

mke real progress in thir realm of proficiency evaluation, id, .&et 

begin with the job. We are definitely in agreement vfr.h this comaznt 

end wish to present some concept8 of how this r,:$c be done. Sa 

individuals have approached thte task by :L+ use of 6upervi60zy ratings 

as an ultimate criterion on the ow ‘jtrd and the development of tests of 

proficiency on the other. ‘31 approach has been one of intercorrelating 

the teut batteries ar.d crl:eria thuo deriving optianm prediction weights 

for each test. 

Thrs courm of action ray vell remind us of the story of a team of 

prychologirtr baLrg called upon to develop prediction devicas for a large 

company. After considerable work on these predictore, thio group wa8 &la 

to predict quite highly with a large battery of temts the criterion which 

war based on rupervirory ratings. The paychologistr, being dfasatirfied 

vith marely being able to forecast there ratings with naariy perfect 

accuracy, decided to do mme further research and to factor unalyze the 

criterion and predictor variables: Lo and behold, the analytlia being 

complete, they found that they were Pearuring job saniority. 

Validity of perfownce amaouree riced not ba related to another 

criterion. These could be ultimate criteria in and of themselves. Let 

UP ruppose that we wish to ueaaure the performnce of secratariee. The 

first thing we should do ir etart vith the .job. We should enumerate the 

taoks that she doar in terma of product-lfke performr.ice. A secretary, 

for example, may do the following: type letters or finished copy fros 

draf te , take dictation, file correa~ndance, #elect mterialr from filer, 

receive vfstoro, and answer inqutrfer over the telephone. From this 

brief description of the recretarial dutleb, we can quite reudily develop 

uome of the following criterion perfonmnce measure#: We can wasure her 

speed and accuracy of typing, her rpeed and accuracy of taking shorthand, 

and her ability to transcribe her shorthand notes, her abill y to sppropriately 

file corrasporxience and to find mterials filed. Her ability 

to anmrcr qumfer on ths telephorm may be somevhat mre difficult. On 

the basin of each score that rhe maken, we can clarrffy her in terma of 

ratisfactoty. unratiefactory, end outstanding. 

The criterion for satirfsctory on typing, for example, mfght be an 

average of 40 wordr per minute on a cartrin specified copy vith no more 

than two error43 per page for a two-page amnuscript. Outstanding on thir 

charactari8tic mfght imrolve 60 words per minute with no mre than one 

error on a two page manuscript. Sfmllar stemjards could be worked out 

for each of the other criteria. 

. . . - _ _ 

. 

. 

_ _. 

. 

_ .- 

42 . 

I 

-. 

c 

i 

. 

. . . 

. 

. 

: 

* .- ‘, .’ 

.+ 

. - . - ,.,- i... .-_.. -... 

__ _ _. _. - - - .- . ._ -_- . _-^_-dA- _______...._ - ^-.- .--. - -. 

. _ 

, 

. 

.

Here; UC are trying to make acveral pointa. k’s ten define and 

develop criteria in term of certain relevant on-the-fob bchsviora, 

Each of the&e different types of behaviors coo be rrreaaured separately, 

and they would be made more meaningful if they-were aeaeured in terms 

of relevant requfrements. It Is quite clear that 8oa.a secretarial jobs 

may require different: degrees ‘of skills. Individual skills could more 

readily be matched to required job akills. There is no need to coc&ine 

all of these acorea into one score and then further putting this score 

into a rtandard score or percentile conversion. This obscures the data. 

Purthermrc, there is no need to correlate each of these scores 

or obtain a weight for these scores by optimlly correlating these 

againat auptwisory ratings. After all, it could lfkely be that WE 

throw out the baby with the bath because the auperviaory rating may 

not be baaed on any of the charecterI8tIca that are relevant to actual 

job � ucceaa . We have all at least heard of cases in which aecretarlaa 

were rated outstanding by the boas even though he has never deen them 

do meny of the taaka typically required of a secretary. 

Teat Reliabil1tp and .Accuracx 

For the purpose of this presentation, we‘have discerned three types 

of teat reliability. The first type deals with the cqufvalence of two 

or more form of a teat. The second concerm itself c:Lth the stability 

of the seam teat given at two djfferent timor. Youmgenelty or sameness 

throughout the teat ir the third kfnd. There 1: w question as to the 

importance of each of these types of reliabflfty .Y connection with 

aptitude and peraonaiity testing. A question does .clae, however, as to 

the appropriatmeaa of atabllity and hamogtneity coe...cienta In achievemant 

testing. There ,La no doubt a8 to the importance of equivalence of 

form for a particular achievement teat. The present author8 raise a 

question on how one determines the equivalence of hro fonaa in a training 

situation. 

In the ideal training situation any individual or group of ‘ndivduala 

would theoretically score zero on any test at the beginning of training 

and LOO percent at the end of training. While it la true that few, if 

any, ideal situations of this sort ever alat, it is important to take 

the ideal situation for conaidcration. In an ideal rituation such as 

aamtioned � bwe, If we used end-of-cmrae achievement teat as a measure 

end used the traditional coefficient of correlation as the tool to 

determine the degree of equivalence of two forma of a teat, we would 

find that the correlation coefficient is not very high. E%en in a amwhat 

less id-al situation vhere ve have acorta bunchlzg up at the low 

end during pretesting and bunching up at the high end at the end of the 

course, we would not obtain a very high reliability coefficient by computing 

it based on only poatttat acores. 

43 

I

. . 

- 

/: 

i 

. 

. --_ 

- _ _. _ 

. 

-. -. .-- . . ..-----...-;-.---.- 

_-...II -.--.- -_.. _. 

our proposal is, thus, that if we muet hnve a correlation coefficient 

:o indicate tha equivolansa of .two forw, that data be accumulated 

at the cad of vatloucr portiona of P muroe of ins*.ructioa. ‘fw0 forlao o f 

a test could be adAniatcrad :J 80~x3 indi~iduid a8 a pretct6t, t0 othar 

Individuals ,durfng the courcc of itwtruction,and to othere at the oad of 

. the COUTC~. In this mannat, a correlation coefficient would shw the 

true relaL:on.ship between two forms of a test rind would thca r,ot be low 

due to restriction of a tent and would thus not be low due to restriction 

of range when students arc vcll taught. 

Another posrfble manure of rslisbility of a teat would be in terms 

of accuracy vith which it vaasures that whfch it is suppoead to umiure. 

This ir closc’ly akfn to relevancy azA validity of zmmuremmt and vi11 

be discussed in the n)rt rcctioa. 

Tart Validity and ValKdation 

There are a aumber of crroneoua ccnceptr concerning achievmmznt taatfng 

which rhould be clarified. when we speak of t~hievment or proficiency 

tedting, we are, in emenee, dircusrfng a criterion. If thfs bo the ease. 

vhy do vc have to wonder whether our teat predicts job nucccss? Wfliltl i t 

is true that there-are more than just job comgomnta ta the sumesaful 

psrfmnce Of 8 job, if WC can 8t loAnt ACcurat6ly nremrure the VariOUII 

cmpments of a job, we will have gone a loa vay toward thu actual 

devclopnt of a crfterior. 

There are those who feel that tsmy echlcvczmnt or pcrfonmncc teat8 

must correlate highly with an ultinrte “criterion of job 8uccem 08 

dctsradned by supe~~sory ratings.” We have already -de our point 

concerning this aspect. If we have szaeured a seczetary’a proficiencies 

in trF& of speed aud quafftF of typing, .shorthand, and fLling, we have 

already gona a long way towards umaruring her total job performance, 

although we nay hma naglectcd to rrrtaaure her initiative and mtivation. 

There tvo latter F?;jlzits, we vi11 agree, can better be measured by supervirory 

ratinga. We see no need, on the other hand, to validate the other 

measures by correlating them wCth � upenrisory evaluation of perforrcancc. 

, Ue do aot thfnk that the traditional practice of obtaining item 

total test correlations on achievement tests in very worthwhile, Sam? 

test developers seem to forget that this practice leads to homogenlafng 

the test, A practice which is generally acceptable in aptitude of 

parsoumlity test development. In the case of achievement testing, hwever, 

this practice has very limited usefulness. He feel that item 

validation could better be achieved by admiairtrat

‘a 

: . 

-.. 

. . 

1 . 

. . 

. . 

_. . . . 

‘. 

.- 

. I 

---__ _ _,___ - --_- -- . _-__ i_ -___-___ .__ __._ _. . _ ‘_ -. -_ - - _-.. _-..- .- 

.* 

; 

Pf we aauot valldate echievcmcnt and performance teoto ngajnrt 

on-the-job criteria, certain conditions muat be preoent. Part of the 

on-the-job criteria murt. include oimulated or real taoko or job ample 

.s 

teoto. Each of there job criteria ohould bt measured and ‘deoignatcd 

reparately . The relevant rchool or trairiing criteria should � l8o be 

measured in a 8imilar manner and relationrhipa should be eotabliohad. 

If correlation8 are desired, they can be obtained by correlating the 

variour relevmit 8cor’co of not only those individual8 who 8UCCeS8fUlLy 

complete a cour8e of inotruction, but a180 of those who do not OuCCtEQfully 

complete, or vho go on the job directly without any training. 

: 

We feel that there ruggcstionr will open new avenue8 for axploratioa 

in proficiancy an’d � chieveznmt evaluation. i 

. 

_ 

Referenccr i . 

Prederick8en. N. Proficiency te5t8 for training evaluation. In Glaoer, 

R. (ed.), Psychological research & training and educatCon. 

Pittaburgh: Univerrity of Pitteburgh Prerr, 1961. 

Glaoer, R. Instructional technology and the mcaourement of learnfng ! 

ou tCOr888. Amer. PlychOlOgi8t. 1963, 2, 519-521. i 

Glarer, R. 6 Rlauo, D. J. Proficiency meaaurfmmt; � 88e88ing human 

performcncs. I n Gape, R. R. end Gtherr, Psychological principlea 

& ryotcm development. New York: Iiolt, Rinehart, & Winrton, 1962. 

- - 

Nedelrky, L. Abrolute grading otandardr for objective tertr. e. 

Poychol. Measot., 1954. lb, 3-19. 

Thorndike, R. L. 6 Hagen, E. Haaourement and evaluation & p8ychology 

fi education. New York: Wiley, 19557 

Tyler, R. W. Achievement tc-!ing and curriculum conotruction. In 

Williamoon, R. G. (ed.), Trcndo - in - otudent pcrronnele. 

Univerrity of Hfnnsoota Prero. 1949. 

Zaccertq, H. A. h Oloen, J. Reapprairal of Achievement Raaourer 

USAF i.:-tr. J., 1953. 1 73-75. 

,’ 

4 5 i : 

‘? 

! 

! 

1 

� � 

i 

t 

x 

i 

i j 

j 

I 

.I 

: 

I 

I P--- 

� 

i 

___-.-.. 

. 

. 

, . . 

_ - .- i.... -__ _._ “_^_~ . .__ -._ _. _.^.____. “_. _. - . _.-- . ..- .___a __.__ _.-.- . .._..--.-_-. -.- “.-.^..I-- ---- -.A.- _ _._--_. _-_.._ :..,.-, -._- --.-. -.---. ._,-_._ - . _______*__ _----- . _ - . - -_--.-_ - - 

. t 

i 

-III_L-- i

-- 

.r 

. . 

“\. 

.._-. 

-. 

.- -__. 

, 

. 

-. 

, 

. 

. 

. 

. 

. --_ ._ 

-----. - _ _ ._-._ -.._ ._ _ ^ .- . -- ._. _ 

Item ~nalyrie Information for Test Revision 

VERN w. URRY 

US Arwy Enlisted Evaluation Center 

The item aMly8is coding procedures used at the US Army Enlisted 

Evaluation Center represent an application of Cullikren’e item &leCtion 

techniques to MOS Evaluation Test redevelopment. These techniques were 

advanced by Culliksen (1950) in his “Theory of Mental Tests.” They call 

for the use in item anrrlyeis of the point-biaerial correlation coefficient. 

Ws are currently using thir statistic for that purpose. 

XtCcP analysis codes are provided to indicate appropriate actions to 

be taken by item writers in regard to individual HOS evaluation test items. 

While a full explanation of the 8 Codes used vi11 not be given, the present 

paper will discuer the statistlcal bases for theme action code8 vhich have 

to do vith the selection of acceptable item or the revision of items 

needing minor changer. The advantage8 fn improved measurement3 to be 

gained by the implementation of these particular action code8 will al80 

be dircusscd. 

Basically, item analysis has am its purpose the development of a 

test vith a given set of statistical chararteristica either through 

item eelectioa or item revisfoa. For a lOO-item HOS Evaluation Test, 

the deeired set of statistical characterietice vould include a mean of 

62.5, a standard devtatioa of 12.5, and a reliability of .80 a6 a mFaimum. 

The desired mean equally divides the range from a chance score of 25 

to a mxinarm or perfect score of 100, and the derlred 1taadard deviation 

io determined by dividing 6 into the above range, since 3 atanda:d 

deviations on c ither side of the mean cmtaine, for all practical purposes, 

a normal dictribution. The chance scora ir a function of the fact that 

each l4m Evaluation Tec;t Item contains 4 plausible alterridtives; hence, 

if OM were guessing 11’4 or 25 of the item would, on the average, be 

msvered correctly. 

The it- analysis action codas to be useJ by iteta writers take 

into account the dcrlred set of statistical characteristica for NO!3 

Evaluation Tests and the item statistics required to cbtafn HOS Evaluation 

Tart8 vith these atatirtical characttrirtics. Item rtatisticr relate to 

the statistical characterirticr of a teat in the following annner 

(Culliksen, 1951): 

. 

. 

46 ’ 

--. . __. . ..-- - - -_-_-- 

. 

I 

e . 

. 

. 

. 

. 

. 

. 

: 

. 

_.. ,. 

‘-. .t . 

^I. .^_ 

. 

.._.._ ^- 

. ..- 

.-.- 

- -- -- -. --- 

. 

# 

. 

. 

!

f 

-. 

‘Ibotertfaoul,X:-Cp I 

.- 

. 

._. 

. 

(Formula 1) 

Where: pi, or item p-values, la the 

proportion of individuals 

anawaring the item cor- 

, ractly 

The ‘teat � tan&rd deviation, s, - Critai (Foraul* 2) 

The teat reliability, rm = 

Uhere: rit8i la the point-biearial 

item-teat correlation 

multipled by the itar 

etandord dsviatior~ 

‘1 ‘JPi (l-pi) 

(Formula 3) 

Where: I: lo the numbsr of iteqa in 

tho toot, 

� � 1 the itaa variance 

si 

‘I - Pi(l-Pi)* 

tits1 la the point-biaorial 

itaa-tect correlation 

nultipliod by the itcap 

standard devir tfon 

In this ragcrd, ony itt-a action coding proceduroa that moues from 

an ito analyaia should taka into account the end produce, or HOS tvaluation 

Teat, which vould result, given the selection or revlaion of itwa 

to k included in a rodevelopod teot. 

In keeping vith a deaired toot mean of 62.5, all itrn p-valuoa � hould 

�� rouod .325. For a doaired teat atonderd deviation, I:- p-values 

uhould also rango around a mdian value, I.o., a8 nmr .50 &I la practicable 

la view of the dealred mean, rlnce al - pi( -pl) and lo at a maximum vhm 

pi - .50. 

Ap-? 

The mlacted or acceptable items � o coded in the present procedure 

with codes OK or M (mia1nslly acceptable). Itana to be rovlmed, Code 

RV:, reach the HA level for item-teat correlations but either dirtractorteat 

cotrelations or distractor proportions indicate they need minor 

modification to be in consonance with tho � tatiatical characterlotica d 

doaired in tt0S Eveluetion Teata. 

, i I 

. . 

i 

I

. . 

. 

_. .- --\ 

..-._-_- . .._ 

;--. . : 

.: ‘. 

. . 

‘, 

(- . . 

. 

. 

. . 

I 

. 

‘._ * 

-. 2’. -.‘. 

. s 

_ _. -.- : .--..- .: -.._..- -- ..--;-A---- -___ ._ 

. 

1 

I 

Distractor-test corralstions indicate revision is ctcessarp if they 

exceed the correlation set at the HA level. Distractor proportions are 

indicated for rtvirion if they fall outside a range fro66 a proportion 

of .OS to a proportion of .25. A minimum proportion of .OS wnd elected 

to provide that scm6 plausiboility exist in all distractore in order thut 

a theoretical chaacS6 scort.of 25 is teuablt. A msxinarn proportion of 

.25 wa6 btltcted becauat (1) all rc6pons6s, distractox and correct 

* 

altemativts, would have s proportion of .25, if examinaao had no icnm- . 

ledge regarding the item and (2) when the.p-values of correct alter- ’ 

native6 approach .25, chance rc6ponses tend to lovar item-tsot correlation6 

(Gullford, 1956, p. 437). . 

. * 

. . 

Applied in conjunction, the above limits provide: (1) that :tem 

p-values lrvy range ‘bstween .25 and .85 which ie rearonsble in view of 

the desired man; (2) that the resultant test mssu is near the desired 

valus; (3) that wximm item standard deviation8 art obtained rince 

item p-values am msintainsd at andiaa values; and (4) that item p-valuer 

trt within ranpr wherein item-tart correlations art higher since extrexe 

or nonmedian p-valu66 for items tend to reduce item-test correlations. 

To iIlu6trata how Ponda 2 works in regard to the item coding 

proctdurtt, we can take th? txmple of ittw coded CX and HA: 

Given 100 item6 with item-ttrt corrtlacions of .30 (the tX level 

for larger NCS 8+6ples) and item standard deviation6 of4-9 

.48 

the test standard dfmfation would be: 

Ix - 100(.30)(.48) - 14.4 

With 100 item having itmn ta6t correlationa of .20 (the minimslly 

for larger HGS ramplea) aud i-em standard deviation6 of 

.48, the test standard deviation would be: 

sx - 100 (.20)(.48) - 9 . 6 

’ ; With item 6tleCttd or revised to txcttd there item-test correlation6 and 

to � pprosch these item standard dsvistions it in apparent that a daairtd 

test standard deviation is tmdt more posriblt. 

! 

‘lb illustrate how ?oxw~la 3 worka in relation to tbt item coding 

procrdurtr,wt can use 100 items with the average variance of (.625) 

(.357) - .23 and a test standard dwiation of 12.5. The miniatum tt6t 

reliabflity in thir cast would be: 

rxx - loo 

99 

l-.=&-.86 

In this msnner adequate reliability tends to be iasursd by the item coding 

procedure6 . 

48 

. 

. 

. 

i

-.. 

:.. . , 

/ 

_ :. 

. .A. 

. 

. 

- 

---. * 

I 

. 

; 

. - 

i 

, -. 

. 

. 

Since &zm statistics ralata to the statistical charactcrtstfcs 

of tests as ,indiceted fin tb above formulas, it is further possible 

for test spclaliats to exercise more control wet thase aspects. This 

furth3t control fs possibla becPus8 each stetistfcal charectaristic of 

a redeveloped test can I32 estimated from Ftem ztatfsttcs which ~111 %e 

provided on each 140s Rvaluation Item Card. The sumatioa of these .Ltcm 

statistics and entry Znto the proper equations for the eatkstion of 

aech of the test statistics is thereby fccillated. Items to be included 

in a redeveloped tent could then bo varied in order to more closaly 

approach tha desired set of statfatical charactcrlstics. 

la sunstmry, action coding procedures tsnd to channel efforts in 

item selection and item revisfcn touardr HOS EvalustFm Tests of dosired 

characteristics; and access to pertinent item statistics on the part of 

test specfallsta vFl1 prwlde for a more accurate approzcinution of the 

desired set of statistical charactertstics when redeveloped tests are 

subsequently used tn evaluation tasting. 

Reference 

Cullikaen, H., ‘fieor]! ef F?ental T& Xew ‘York: Wiley, 1958. 

.I - 

49 

‘. 

!< 

,! 

�� 

.:

. -. ‘,‘,.. . . ‘-’ _- _. 

1. *’ * _. L-.--- -’ ._--- c ._.__ ..’ -:.‘:.--2 .. - --.---_-.2.e__--P ,-* 

.--m--L. --. __.. -_ .- ..._I_.___ _______ 

I 

I;.;. : .i 

, : 

I’ 

: .- 

, 

. . , 

L 

/; 

s’ 

;. 

, . L 

,.- 

, 

Item Validity Informetion ee a 

Beeie for Teet Rtvisim. 

VERN W. URRY and EDWXN C . S’RIIKEY 

US Amy Enlisted Evaluation Center 

. - 

Xort of ue would pro’bmbly agree thet ~concepte such 88 tee; reli- . 

ability and content validity are important chuecteriotice of teote 

which ut ueed to evaluate ability or knowledge of -human befngm. All 

too often, hovever, there concepte are over-eanphaaized at the axpenee 

of a tart characterietic which ir coneiderably more fmportmt. 

A teet mey be perftc,tly reliable, yet mi&w~be measuring entirely 

nonvalid varianct, i,,e., it could bt measuring the wrong thing but 

doing it in e beautifully perftct vey. Content validity, very important 

for obvioue reaaose, ie occaei~nelly viewed by layum a8 the only kind 

of toet velidity that ie rtquirtd in Order to 18bel � ttet ne being 8 

good ant. Now the abrurdity of this point of view become8 mert apparent 

when one tnalyxtr the types of dtciefone that era going to be rnadt ebout 

tnlitttd men on the beet8 of their tart tcortt. Hart of the dtcieione 

are bared upon.an important aeeumption. The aeeumptioo made is that 

those individual8 vho acore high on the teat are performing better on 

the job (end thue should be rewarded), and those rcoring losrtr on tht 

teat art accordingly poortur perfonnere on the job (and hence ehould not 

bt rewarded). Whether th:Lo ir a true asrumptLon should not be left to 

cr:etal-bell garing or “ermchair” peychology, but � hould be subjected to 

verification. Thir vtrificetion mey be referred to as eetablishing the 

dtgrte of azpirical validity of a test, 1.6.. hw do ttst score8 actually 

related to en external mtesurt of on the .job pcrfonnenca? Ut et the 

Depertment of the Army ham rectrtly ‘incorporated e plen of v8lldetion 

for both the eveluetion taete ana the C-der’e Eviluetion Report. 

Detail8 of theta validation procedurea will be diecursed in mother paper. 

The purpose of thie paper io to describe � sptcific facet of the etudy 

which we cell tht “preliminary validity report.” 

The prtlimfnary velidity report ir Intended primsrily for the u8e 

of tt8t rpecialisto in the Test kvelopmunt Branch of the US Arm9 

tnlieted Evrluation Center. The purpore of the rtport is to provide 

guidelines in ttrt rtvieion bared on pertinent � tatfetic~l date which ie 

not aV8ilAble from our reguler �� rulyeie etetirticr. Thfr prtrtnt 

validation policy reprerente an improvement ovtr prevloue velidation 

offortr in that validity data covering the entire ttrt outline and 

indivfdurl test iteuu can be provided prior to teat rtvieion to allow 

fat their uee in the decirioa mrklng process of tort dtvelopocnt. 

, 

. . 

, - _-. _. 

. 

, ” * 

50 

---,- - . . 

._--_-_.

Teat reliability? -- Certainly 6OmC reliability 4-6 BlCCC66~~ in our 

test6 in order to obtain empirical --alidlty. Content validity -- again 

we era certainly concerned ,that itema in our tests should adequately cover 

the relevant arpect# of job behavior. Eut UC also feel thet there are 

only the minimum dcrsirable~ct~aracteri6tic6 thnt a test should have. Our 

earlier validation rjtudics asked the question: Doe6 this tC6t pO66e66 

anpirical validity? In the present validation procedure, thara 16 no 

lOtl@W M empharia !BOlely upon the qu56tiOIl -- he6 this tC6t pO66e68 

empirical validity7 -- but more directly urd more appropriatoly, the 

qUC6tiOKI is Why is the test valid? and Eov can it6 validity be improved? 

. . 

As noted in. th‘ra &k&ue paper, 'this one al;o reprer~tr ‘thi 

&pplication of itan selection technique6 advocated by CullikrCn in hir 

“Theory of Mental TlBSt6." Again, these techniques raqufrs the use of 

the point-birerial correlation coefficient for it--tC6t and ittsncriterion 

correlat Ions. The specific question v6 ark is, How do each 

of there itanr fnflrreoce the validity of the total evaluation test? 

The data provided for ita in the preliminary validity report of 

our validation sample inclu.de: (a) item p-values, (b) standard deviations, 

(c) itar vaclancco, (d) point-biserial itan-total test correlations, 

(e) point-birerial itan-criterion correlations, (f) item relitbility 

indexes, and (8) itcun validity indexas. 

The indexer of rcliability'ami'vrlidfty are obtained by multiplying 

the standard devfat:lm of each item by it6 rarpactiva point-birerial 

itenwtC8t correlation or point b18crial item-criterion corralatfon. 

They reprccrent the contributfon that the ftan maker to the reliability 

end the validity of the total toot, respectively. 

It is the last concept, the index of vrlidity that nw pO66e66e6 

special rignificanccs to our tc6t 8pcciall8t6. By inspection of the 

indexer of validity for each it-. they can detcnaine which itaru are at 

hart raeking a pori~cive contributim to the validity of the total tt8t, 

and inrure their inclu6fou in rubre~.uent te8t rcvirionr. If item 

validity indexer are available for each item that 18 includad in a 

test, it will be poerible to cetimate the total tart validity before 

it 16 � dmini8tered. The paper On itan Mdy8i8 hd6 irlicatsd hw telt 

-. . 

characteristics ruch a6 the mean, rtandard deviation, and reliability 

can be more closely ccntrolled. USC of the item validity index enable8 

US to exercire additional control over the mo8t salient feature of a 

tart, it8 ability to dircriminatc bctvrcn those lndividual6 vho 8re 

. 

good and poor performer8 on the job. 

51 

.

. 

. 

. 

. 

. -+_ 

. 

. . 

*, . 

_ _ _ _ , ^ 

. 

. 

.__ _ _.,_ _C..I---I.- -..-: .-.-- - .-._ ^ ._ 

To fllurtratr how the indexor of validity for aach itaa lcflucoca 

the validity of the tote1 tart, consider tho follcwin~~ formlo, 

Ubrt4: 

ber thb validity of the 

‘xv tart, 

Zri,ot, the itma validity lndasi;, 

io tha point-bioerlal itemcriterion 

currolatioo 

multiplied by tho itrar 

rtmdrrd deviation, 

Bitei, ths itaP roltebility 

index, io the pointbioerial 

ItCE-tQot 

correlation luultipliod 

by ths itom otmdard 

doviotfon. 

Prom one cf our validity � �� #the volfdfry eoefflcieot for tha 

tot81 test was .52. It wao found that the Cricoi two equal to 5.541, 

and Eritoi wab equal to 10.5’77. Entering the above equotlon YO find: 

‘w - 

5.551 

10.577 

Solving for the above, the rtV oqualo .525 which corraopondo to the .52 

which woo the volldity of th’o totol evaluation teat cmputod ftm the 

Powron product-nt formula. uoe of the indu of validity ptanotoo 

the � oundneoo of the “item-b,ank” concept,vhstetn numtouD prwfourly 

tried Itmoo pro readily wcsooible to the toot conotructor. 

Another urrful placr of information that vi11 bo provided thr teat 

� pmialloto lo a grophlc plot of the itaa-validity indexer and the itwe 

reliability indoxeo for each of the icmaa. One CM, thrroforo, tall at 

a glance vhich itano are contributing moot to teat validfty and taut 

reliability. Thio lo eooentlally tho oome kind of graphic dzrtration 

ahown by cu~likmn $n hio chapter on itm � olyoio. 

Row there ore the major changer in our volidotion � tudioo which vo 

bavr preomtly incorporated. Since Incorporating thio nmthcd, we have 

cuoplated or aeorly completed studier on � pproximtely oevon different 

mllitmy occupational opoci*litioo. Portunotoly, nearly all of rho teoto 

evaluated no far poorsrr � oam drgrar of aapirical validity. 

-- ___.-_--- _--~_ . .., __ .----.-...----- --. -___ __- . -- 

. 

. 

52 

._ __ -. 

. 

; 

. 

c 

. 

. 

. 

. 

/ 

:’ 

\.. 

’ : 

. 

�

. 

. 

One of the p~oblmnr that we probably all encounter in OUP taste me 

hiuh intrPcorrel4tionr baltween rubfact-matter 4~448. mture t o r t v~lidation 

affortr at USAF&Z will eneapparr further changes aimad of helping 

LO 4llOvfate thfr 8ittUtiOtl. !?88Qntirlly, -14 Are now plannin~~ a factor 

aulytic approach to telt. devrlopumnt. l-ho uhury4fin8P KwltlfoctoPl4l 

itam-snolytlc method has been ool6cted for thir purpoor. It 16 a method 

for sotfnrting f4ctor lordiago vithout the &rtai&bla tark of eauputing 

4 10+tx-125 v4ri4bls fnt,oPCOPP4l4tfOU WLtriX. A6 coon a8 ccxeputer prograplo 

4ro devolopod 4nd caarputor tio+o bocarnrr 4v4il4bl0, ths Evalu4tiou 

Section will rprt6mtic4l.ly bogin utr4cting group8 of itan that rhatld 

br rcorod tog&hat. We hlopa to obtain 8ovo~4l orthogonal rubto8t8. 

Tbfr approach h48 two m4for adv8ntagor for the Army. Pir8t, it 

8hould rerult i n an improwed diagnortic profila bec4ure mco~o8 uould 

tend to be independent of uch other end thu8 vould be awra meaningful 

and more e48ily interpreted. A profile tend8 to be of llrafted 000 if 

the vufour profile � �� being 8COTOd are highly intercorrelated. 

Moxinnm utility o f profile .8cor48 demndr r414tiVOly unique varf4blo8, 

which m4~nr th4t e4ch variable OP to8t ue4 me4ru~4o 4 factor uaivac4lly. 

We are hoping that the Wherry-Wtnor approach till help u8 achiovo thi.8 

end. A second advantage fr thrt we till lncr6686 our knoulodgc about 

valid factora. If our Vo.riOU8 rubtest 4rQ uaivoc4~, the; masruram4at 

of tha moat valid foctorr cm be BlltphaOfhad which w6 hope will lead to 

wan gr6ator incPue68 in the validity of the tot41 ovalurticra te8t. 

In 8u11ana~y, the ~80 of tha prerontly incorpor4tod proliminrry 

validity x-sport and, ln puticul4P thr index of validity rhould help 

to inprove the urafulnrrr of our sviluatfon tolta and incPea80 the 

4ccurocy of porronnol docfrionr based upon thorn. Out long-rurge raoc&rch 

probleo till ba that of dotsnalning the porribility of obtaining orthogonal 

rUbtOrt vithln our 4VAlMtiOn te8t8. 

53 i 

1 

i 

f’ . 

1 1 

I 

---.--- .I. 

i 

i 

. /.‘..: i, 

(2 2‘ 

. 

I

I 

. 

Guilford, J. P. Reliability and validity of ma&mre~. Facto? At;slysfs. 

~uu York: ~~Cr~-H~111, 1954, 470-538. 

Chilford, J. P. Validity of Mururewntr. Pundmmeai ctatirticr In 

paycholoay end eductrtion. Rev York; McGraw-Hill, 1956, 461-486. 

Gullikaen, A. Item eolectfon to cusfmixe teat vel:dity. Proceediaer 

of the 1948 invitet!loml confemnce 011 terting problme, Prinastoa, 

I. J.: Bducatlouai Tooting Sc~vIca, 1948, U-17. 

Wherry, R. J., md Wider9 B. J. A rcthod for frctotiog large nmbara af 

’ ittam. PspchaaatrilSf, 1953. 18, 161-139. 

- 

. 

54 

. 

. 

* 

. 

-._. - .- - - -__- 

. 

. 

. , 

” i . i 

. 

. 

. 

. 

.

. 

. 

. 

The Prsdictioa of fob Proficiency 

ALEXANDER A,iLCNCO 

Staff Research Department of the Chief of 

Naval Air Training, Naval Air Station 

Menphia , . Tefmaoeee 

TM development of thia papar proceeded Qlong the line of b-craic to 

Qpplied rQaearch in the prcdfctlon of job proficiency. TtxImdlkQ, in 

Veraonml Selection” (1949). diatinguirhad between proficiency on the 

job per aa, and proficiency in job training. Only the latter type of 

proficiency ia conafdercd hero. Tvo araQa of roaQarch Qro cxamfocd: 

i.e., (1) b~alc re6eQrch on %otivQtion“ in e training emriroment; (2) 

applied raacarch on the predfctlon of school achlovemant. The firat QreQ 

involvca cozrelatiorml tcchniquaa which arc conaidrred to he Qn iatprwcd 

� ppro8ch to the araaurcmsnt of oonrptltude varfance. Included i n nonaptltudc 

variance, of murae, la a certain degree of Ymtfvatioa” variance 

vhfch haa been Qluaiva gentrally to amasuremant in a pure form. Howaver, 

aa we ahall ace, oum intereating thing8 can be dons in “wtfvation” 

maaauremant by the we of part Qnd partial correlation proccduraa. The 

� pplicQtlon of fnfonnation obtained from partial corralatlon procedure8 

to cndivfduafa by the usa of ratio derived scores wLl1 also be discuascd. 

l’tm aocond area involves the uae of multiple correlation and p866/fotl 

frequency procedutaa to ottain the beat prediction of technical training 

perfonaance. In thie regard, variour aclectlon md training grndea were 

866e8aad for their potentfal to 3 edict final trainfng performance. 

Operationally, thQ6e tvo approaches to the prediction of fob proficiency 

may be conridered to overlap, however, as vQ rhall 6Qe, that0 arc certain 

diatinctionr betwaca them vhich justify their indtpandtnt conaLdtration 

to the m6aaurmccnt of job p,rofLcfency. Several studier conducted by 

the Stuff Rercarch Dapartmmt Qt the Nav81 Station, Mnphir, on thcro two 

areaa will form the aubjcct matter of thfr paper. 

1 would llka to outlima the content6 of tha paper at this time in 

order to give you the focal pofnta of the mtcrlal I will be prorenting. 

I 

fie Hcaauramant and prediction of Hotivation 

A. Residual Cain 

B. Ratfo Xndex 

C. Motivation NQmsuramcnt 

D. Application of the Achievaaont Index (An applied 

rtasarch study) 

If The HtaaurQmQnt Qnd prediction of School Achisvannnt 

A. C,orrtlatioMl prediction 

B. Paar/Pail Odds to Prediction 

5s 

. 

, 

. 

. ! 

i . 

I 

_ 

. 

. 

1: 

. I

-IThe - Mearurc?ront and Prediction of Motivation 

By way of introduction to the rtudiar on the marure&zmt of moth- 

. ’ vatfon tvo 8tati8tical concepc~ nerd to be defined co fit our context: 

residual gain and actual to predicted ratio. 

. 

A. Reridual Cain. “Residual gain@’ ~88 dercribtd b y DUBOIS .in hir 

text on ‘Hultiv8riate ~relatjonal Analysir” (1957) a~ tht rcrldual 

vhich rtmtin, vharc the variance, vhich an lnLtia1 8core ha8 in CQmnOn 

vi th the flntl score, lr partialed out of tha find 8cort. Actually, 

Tborndibt, Brtgmrn, Tilton, and modyard in a text on “Adult Laarning” 

(1928) are credited for the firrt research employing reridual gain. 

Hirtorically, “re8ldual gaie~” va8 developed and used in a context of 

gain in learning. The clarric que8tion in laarning rertrrch has been: 

What are the correlate8 of learning?” This a88ume8 of cour88, that 

learning LJ not to be confurtd vith intelligence, � chievtmmt and other 

factors for which ttrtr have been devfrtd. mutver, several traditional 

probltma have been arrociatad vith finding the correlate8 of learning: 

(a) Aov can g8in be txpretmd to avoid metric difference8 betvren initial 

and final mearur4mtntr? (b) How can vt hold cormtent the known differtnct8 

among Individual8 in learning (or gain in achievtmsnt) in the fiM1 

msnrurement? a,& (c) Art the usual lw corrtlationr among gain eteaJUrbJ 

due to the multiplex nature oE gain or perhapr due to inadequate maJureJ 

of gain? “Reridual gain ,” derived by part correlatton techniquer, appearo 

to provide Jn Jdequate solution to each of there problem. hi8 may not 

sppesr relevant to the mstter in qwrtion here--nmely, tha mesourmmt 

of motivation in a training tnvironment. Houwar, the evoiution of 

“reefdual gain” AJ an index of learning ability va8 df*.tctsd eventually 

toward the w88ureumnt of cwtivation by a minor change in the baric 

dtrign utilized in the correlation81 rtudltr on learning. Thorofore, 

in order to trace the role of 8’remidual gain” in rsrearch on uativation, 

it fr necerrary to digrarr a little further on the nature of 

tBresidurl gain” itself. 

The baric paradigm utillittd in rtuditr on ~‘rteidual gain” fnvolv~~ 

(1) pre-tertlng of achievement of rkill in a particular area. (2) Subrequent 

to thir, a given block of related inrtruction 10 introduced 

axparlrmnt8lAy, or ir provided by a technical rchool (~8 a logical rtquence 

to the pre-test itself, to the circuiwtancer dictate. (3) Lastly, a 

fl.nnl grade ir obtained for thir given block of instruction. The problam 

prerented here is to � �� the degree of learning that took place 

behmen the pre-test and post-tart. lhir can ba done reveral different 

way8 (f.e., crude gain, X g,rin, etc.); however, “re8ldual gain” appurr 

to be a more oatlafactory rtatirtic of gain Ln 1earnLng. 

. 

. . 

. . 

. 

56 

. 

. 

.

. 

The basic fomula uhic:!~ generates the part correlation between 

“reoidwl gain” and a third variable is as follow: 

r(2.1)3 - 

‘23 - r12r13 

Where : 1 - Pratcot 

2 - Poottest 

3 - 0xridr V6rZable 

The boric fotlsuia vhich generate6 the fntercmrelction 5etwcrn cry t v o 

rrtr of residual mores, i.e.. 22.1 end 64.3 16 � 6 follour: 

‘24 - ‘12=14 

r(2.1)(4.3).- - 

- ‘23=34 + ‘12’13’34 

- r2 

l 34 

Where : 1,3 are inftial scored on two 

different test6 

2,d are final scores on tvo 

different te6 t6. 

Thf6 bring6 W CO the point, vhfch I indicated earliet, regard-w the 

tolr of “re6idual .-.6in” in ?ixotLvrtlon research. The! formulas ju6t dlscwred 

prWid6 a technique to divide the measured v6ti6nca fn rkil]. or 

echiwetmmt into two pert6: a part which is completely predicted by 

msrruted skill or schievement obtained at BOWi CCrffet point and & part 

which ie unpredicted by the e6tliet w66ure. Due to the mtute of the 

v6ti6bleo 6nd the baeic p6radfgm outlined above, the logic of the ritwtion 

leads ub to posit that the unpredicted vati6nce of “rClidu61” in tepre- 

6entative of “g6fn” in lermlng. Houevet, if instead of uring an achlevcmerit 

or skill mSa6uru a6 out pte-test we u6ed en eptftude utersute, then 

the logic of the situ6tlon would 146d u6 to porit that the rcrlducl fs 

more rynonymous with %mtivatfon” thsn “gain” a6 described 6bwe. ThU6, 

nothing has ch6nged except the n6ture of the pre-te6t. The vrlue of thir 

residu61 e6 a predictor and criterion both in the measurement and prediction 

of wtivation rill be illU6trated below. 

1. Retfo Index. Iha second concept of importance in our treatment 

of motlv6tlon tsseatch 16 the ratio of actual to predicted achievement. 

This hss been reforrcd to popularly 66: wer6chfcvementf underachievement ; 

.achievement index; dF6crcpancy 6core; PAQ (preparatory achlevemnt quotient) 

at cl. 

57 

.

. 

-._ 

. . 

. . 

- . 

. 

. * 

Duboie (1957) hor rlhovn thet, ‘*e properly conetructod retie (ouch 

u that described ebove) te eaecntlelly � opeciel cane of a reelduel, 

i.e., the velue in the rmrnoretor vorfeble lees thet pert wkfch fe cortolated 

vich the denanioator variable. Accordingly, a rotio � hould 

corralaza .OO with the dcnaminattir verieble.” Thin hoe practieel 

� �� fa that ue ten perfom our rcteaerch in tha form of port 

and pariiel corraletion technlquae, end utiilee the results, if lndlutad 

by the dete, la the form of a ratio Index. Traditionally, thfo 

index has been derived by dividing � atudent’e, obearked performance 

(i .a., any training grede) by hie predicted parformeaco. One could 

divide by � etudent’e rw eptftude grede, but it ie coaoiderad to be 

8 further reflnmeeat to luee e predicted score derived on the beeie of 

OQ@O � ptitudr grede. In � lthar ceee. the reeultent index 18 untorrelated 

tith � ptitude (i..., .OO). Thur. rlmller co the pert correletlcm, the 

retfo epproech dlvidec m trelning grede into tm pertr; thet verfrnce 

rrhich fe predicted by rpritude and that verience which is unpredkted 

b y aptitude: (i.e.,the achlavemeut index). 7714 ochiavemaant index 

will be 1.00 or < ) 1.00 dcpandfng on the megnltudee of the cmmrator 

end denmfnator � �� 

It me7 be benoflcfrl to clarify two uuttere at thie point in order 

ta define thlr index properly. Tha achisvaneat index 1s oiraller to the 

“reelduel goin” AS we Fndfcetsd. The former meeure ie indicative of 

"mot ivat ion ,I’ end the letter, of “gain” In leeruing. One rae7 Infer 

thet both maneuree Ore “pure” indices of motivotiou end gain in learning. 

his ehould not be � eeumed *inc.> they era both roeiduole repreeenting 

verieace unexplelned by mserured � ptitude on the one hand end athlsvement 

on the other. Dubole In “Correletlonel Analyrie i n Training Reeeerch” 

(APA Peper, 1956) and Hayo in “PAQ Kenual” (e CNAl7XHlXA Reeeerch Report, 

1957). rtepactlvaly, indfcete that each residue1 lo not � “pure” meeeura 

of Rein or motlvetloa. 

The � econd problem often encountered concerue the predlctLve 

cepeclty of the � chlevcmmt index. This cm ba viewed in tvo ueye. On 

the one heod, � trefnlag ;grede , � e I indiceted, emi be divided into tw 

pert8 which are uncorraletad vith � rch other. Therefore, in � correletlonel 

� enee, eech pert connrrlbutee uniquely to the prodlctlon of an 

outside criterion. Houevcsr, rfnce these two pert8 era breed oa tha aeae 

trrlning grade, tt � hould not be � rmmed thet the cwbiood prrdlction 

by there perte will be grtrrter then prediction by thet trelnLag grede 

iteelf. This vae diecurecd in � reeanS f3UTECHTR.A rorcerch report by 

-go titled “An ApprLerell of Retie Scores.” Date on thlr mtter ir 

conteiaed In Tcible 1. TV0 rchoolr era reprsrrnted. It ten be rem thee 

the � chievcsnt Index (elno celled PAQ) dose not add to the Multiple ‘1” 

AS coutpered to the rfmple “r” vtmn uring t h e training grade clone. Thue , 

without PAQ the OLmple r’(i wre .761 end .767; vlth PAQ, the Multiple 

B’S vere .762 end .763, rcepcctfvely. However, there ere other prectieel 

_ ,.. -- ---- . . .-__ ^_. ._..- __ - -- .._- 

I 

. 

. 

. 

58 

. 

i 

‘. 

/- ; 

I *

. 

- - 

. 

. 

. 

. * 

. 

i 

’ f 

“. . z 

~ - 

. . 

. I- 

. . . 

. 

t ” 

. . 

. 

. . 

-, 1. _.- j .-._ _.. -.A-- - 

, 

. 

reamn8 for dividing a tr~fning grrdr into tvo partr. Tt~ir iavolver the 

a$e of the achievmmnt index for dlagnortic pwpooer end will be tremted 

in beetion D to follow. - 

. 

Teble 1 

A Comparfson of Simple and Hultipla Correlstion G~tnp 

the Ratio InGax for Am(A) end ATR(A) School8 

-Ma(A) ml(A) (414) .767 

PAQ-AFU(A) ATR(A) (414) .763 ' 

,c. Motivrttioa Xoamrownt. Earing defined out working coneopts, 

of “reoldunl grin” and uchlrvmaent retie, I would !ika now to present 

eme rsrearch findlngr dnvolvi~g the me of both rpproechar in aotfvation 

reararch. The rerurch VB# conducted by the CXATRCRIIU Strf f Reaurch 

Depertmnt located at Hmphie. 

1. PAQ Wmocrl. Iclyo, C. D., ClUTgCRTRA Rereuch Report 1957. 

a. Ratare of AchiovrPont Index: 

me firrt quartioo roleter to the ruture of thr achlevemont 

index. A# wo fndlcrnted, the logic of the paredigm vhereln eptitude 

ir partialed out of a trelning grade luda UQ to hypothuine that the 

vuiance rmainfng ir related to motivotioa OT effort-put-forth. Of 

comae, �� lro lo rocogaited that othor factora contribute to thlr 

unpredicted nrlmce: 1.0.~ error, parromlity dffferencer, lnterert 

end � ttltudee, and pteviour � chlovmeat la � porticulor orea aad ability 

not auusured, � tc . l%r tuk, tharefore, ir to detormino uhat axteat 

motivation ir contefned in the reridual vuianco. 

The method edoptod to roroloa this quertltm was tho u8e of paat 

r*tlnge. Tbo clerric UBO of peer rrtitgr vu employed uhereln clrrres 

of 15 to 25 uen wro erktrd to cuminate and rank thrso rtudaatr In their 

59 

.

. . - -._ 

. 

. 

. 

. . a 

- -- ..-. - _. _.. .-_._ .1 ._-.- ---- --.--- - -- __._.. ..__ __.. -___. ___ 

roctioa who vero trylag the hardert to meter the courno and, vice verea 

for three mn vlro vero trying rho lwrt hard. Thhs objective VW to uma 

aa early � chlweaeat iadut to predict effort en � �� by peer rqtbgr 

later in training. (Ttm rooulrr are tontaiaod In Table 2u.) The carrelationa 

of them two mumura~ range fraP .17 to .40 for four tcchalcrl 

trdning ochoolr and are trll rlgnificant. 

trblr 2r 

Correlation of Achiovemmt Index vith 

Paor Rctiagr on Effort 

Avfat loll School w r 

1. h(littO biOCh&AiC 166 .40 

2. Structural kkchmic 161 .U 

3. glectrmico Technician 209 .17 

4. Training Dovicemn 266 .39 

.- 

Since peer r-.tlngr th.melvee ue not imum to haio effactr for 

rwaon~ � uch � # intelligence or clorr achfavment, further InvoatigrtLm 

wu made to remove the effect of rrtLngo on InteAlfgmao frar the 

correlotion of rating on rtffort wd the � chievumnt index. Ttlir was 

done on A ample of lb6 rtudmstr in the Engine ktachanic Schwl at ntnphie 

(cf. Toble 2b). ;he rowkant pertlal corroletion VU indicated to bo 

*lb. When compared to the original correlation of .40 for thoro #am 

rariablsr vo cun 8~ that halo did lnfluenco ratinge oa offort - but tho 

portlol corrolotion vu #till � ignlficont of tho .OS 1~01. Uhilo mro 

roroorch nudr to be conducted In thle � �� tt can bo raid that ea~ of 

the vsrianco In the achievement index ha8 been idmtiflod 01 bring 

related to offort. 

. 

60 

.-..___ ._.- _. __ ._ -.. _ 

. *. 

. 

e 

. 

. . 

. 

. 

,

-- 

1 

. * 

__-_..___ __.-_.- I . 

� � 

. . 

� 

._ _, _.-_l ____- -..c--..- -_.--- . ..-- .___ll-l. 

Table 2b 

Intrrcorreletkms kaong Pear Rating8 on Effort end 

lntelligmce end Achievmmt Index 

1. Peer Rrtlngg--ori Xntelligenee 

2. Peer htingr 00 I!ffort 

3. intelligence (#my CCT) 

4. Aehievaent Index 

b. Stebilicy of the Achievasaat Index: 

1. 2. 3. 4. 

- .66 .46 .46 

e .29 A0 

m .lO 

The � ecxmd quution invclvea the relfrbility of the 

ehievement index. fafcrrmti~ on thle quartloo. ~08 drawn from enother 

� tudy Involving 196 8tUdMt8 for the Aviation Structural Mechanic School 

at Kemphis. Correlations wre obtofnod brtww tha achievePant index 

derived urly in training end at the end of treiaiq in two 8chooli. 

h e COlTOl~tiCm8 cbtelmd wro .69 and .35. fii8 indlcatu a noderate 

atability in the achfevtnant index. At leeat, it indicator thet the 

re8iduel fr not compared entirely of random error. 

2. Uotlvation Uea8uranmt. tteyo e n d Umnning. l!ducetiaul and 

P8ychological Hsrrurazmx. Volume 21, #usher 1, 1961. 

lhir rtudy invertl~.&tul enothor bark arpect of the � chieveeeat 

index, nrrnoly the prediction of � rubrequent � cklevaent index by e 

orriety of motivation meaoure# Including en early achlevment index. 

Actually, the echievanmt index in thlr inrtanco took cho form of a 

part corroletion fnrtud of e ratio of obrerved to prullcted achievemmt. 

A8 we indiceted before, there WO 8tetfetiC8 � re 0888ntldly 

rimiler. 

The mocLvrtfon meaeure8 uaployed wore: 

a. Peor retlng cm offott 

b. Self r*ttngr on effort 

C. Chock llrt o f rtudeat behavior 

d . Pietorte Nearurea 

-._ ----- ----I’ 

. 

, 

61 

. 

, .

. 

. 

e. Grader in an urlfer couree wizh aptitude rauoved. 

(Thie is the reeidual which ie basically einilar 

to the retfo achievwsent iodbx and wa6 tentatively 

ueed a6 e mexeure of motiwtion.) 

The analpele proceeded in three etepe. Piret, fntercorrslotione 

meong the ten variablee were obtained (cf. Table 3). Secondly, the 

three aptitude meaeures vare partfxled out of each of the reaefning 

variable6 which wore thn intercorrelated u partial8 (cf. Table 4). 

Leetly, multiple corroletione vere canputed batween the raeldualfred 

motivation and echool grade predictor6 on the one hand rad the reeidualixed 

eriterlon variablea on the other hand (cf. Table 5). A8 you 

�� ee, In tha lattar table, all the predictor raeiduale combine to 

predict both criterion roelduale very well. The maximnn multiple lo 

a70 for criterion #I, aud A4 for criterion i2, however, the prisnxry 

variablee involved in the multiple partial correlation were the peer 

rating residual and the fundemeotale echool grade reeiduele; i.e., 

4.123 and 8.123, which yfeldcd~emltiple partial correlation8 cf .67 

and .41, reepectlvely. 

Table 3 

Intotcorrelatio=a Ffeme and Standard 

Deviation6 of Verfater - (H-196) 

Ueaeure 1 2 3 4 5 6 7 8 9 10 Me&ii S.D. 

:: 

43: 

5. 

6. 

2 

9. 

10. 

cm 29 29 11 -09 04 15 36 33 31 

AR1 04 00 -10 03 08 38 20 -01 

HECH 16 08 10 05 22 43 38 

Peer Rat inge 28 19 00 30 59 34 

Self Rating6 -03 -10 09 19 04 

check Llet of 02 11 22 22 

Student Behav. 

Plctorlel Motlv 09 01 03 

mch Fund 57 40 

Sheet Hetel Unit 61 

Welding Unit 

tiote: Decixul point6 aeitted 

55.61 

55.54 

56.26 

29.24 

30.20 

97.71 

7.18 

81.70 

79.37 

82.34 

--- .._- _-.- -~ 

. . . 

. 

. . . 

. 

. 

62 

* 

6.61 

5.82 

6.76 

9.60 

9.49 

11.47 

2.27 

6.44 

4.89 

5.24 

. . 

. 

. 

-y i

. . .* 

. 

- ..-.. ___ - 

--- -___ -.__- FI_---wmw--L. - --. -..h-.-.__ . 

. 

l4eArura 

Tab18 4 

fntArcorraltrtfonr of XotivAtLon and Criterion 

kftmmrei, Aptitude PertiAled Cut. 

. 

x4. 123 ‘5.123 ‘6.123 ‘7.123 ‘8.123 x9.123 ‘10.123 

X4.123 Parr BAtingr, - 

29 18 -02 29 60 30 

X 

5.123 Self RAtbIg -03 -09 16 18 05 ! 

‘6.123 Checklirt 01 09 20 20 

‘7.123 Pictorial HUB. 03 -04 -02 

‘8.123 Uech Fund 49 35 '. 

%.123 Sheet !tOtAl Unit 43 i 

‘10.123 Welding Unit ” 

Note: DeeFePel points amfttsd 

Table 5 

Hultiple - PArtfAl CorrelAtioa Coefficlentr 

Betveen HOtiVAtion and CCitSrfA VAriAbleA 

MtiVAtioO #eArUreA Ctitsrla 

Sheet MetAl Unit Weldiog Unit 

x4.123, xS.123, x6.123, IK7.123, ‘8.123 .70 

‘9.123 ‘10.123 

. x4.123, x5.123, x6.123, ‘K7.123 .62 .34 i i 

. 

‘4.123, ‘8.i23 

63 

. 

. -. / .7 

./ .i : 

,>: 

c 

LX, 

; 

i . 

i 

i 

i 

I

, 

- .- - . . ---- .-.. 

--. 

- 

D. Application of tha Achievamant Indax. I would like to turn now 

to 8 study on an application of the � chievanont ladax in a training situation. 

A8 we Indicated, the � chievmnant index can most easily ba darlvad for 

individuala by men8 �� ratio of. actual divided by predicted achievomeat. 

1. A Note on Under sod Ovarachiavcmunt. F’rochlich, A. P. and 

ttayo, C. D. Par8unr;el end Gu:ldanca Journal, Merch 1963. 

This ~jzxnel rrticle rcp:rorantr a susxmary of the raraarch on the’ 

� chiovammt Index and ohoutd be � ppropos al80 to the termination of my 

own remarks on thin subject. Froahllch indicates two applicctionr of 

the � chiavamant index (a) Bareuch md (b) Prediction. We have dwelt 

at langth on the potrntlal thlr Index haa for purpoaar of baric rsrearch 

on learning gala and motiuatImon depandtng on the nature of tha variable(s) 

partialad out; i.a., whathar ,achiavmant or aptitude 18 ra8lduallrad. As 

regards It8 predictive aspacts, we elro heva indicated that the achlevaindex 

doe8 pradlct training criteria but doar not add to the training 

grade. Ae notad, it repromate the unpredicted variance of a training 

grade which, whan corabinad with the predicted variance of that grade 

sat8 the multiple I%” a8 quel to the rtmple “r” (by urlng the trslning 

grade alone) In predicting en external criterion. Proehlich cleerly 

point8 that fact out, but emphasized that the marit of this dichotomy 

liar in It8 iroletlon of variance vhich wa8 not predicted by the ability 

maa8urtr utilized. Thu8, .when derived aarly in training, tho � chlevantmt 

indax 18 conridarad to hava marit In counseling “low-achiaver8” (i.e., 

those who scored 1.00). Iha study indicated that, “if tha unpredicted 

pert of academic achiewment measurer tnotlvationel varience, it might be 

bvpotherieed thet if underachla~var8 of various ability levelc could be 

motivated to put out more effort, 

heve more interest, or t8ke on more 

positive attitude8 toward their studfe8, Sane might im;*‘ovc their final 

gradas .‘I 

In resume, it can be reid thet the achiavemant Index hao 8 graat 

potential for basic research in tha difficult area mearurlng non-aptitude 

variance ruch a8 motivetion. The pertiel end part corrclatlonr offer a 

good technique to isolate andi identify the corralatom of this rariduel. 

SOW use of this Indax in tha form of a ratio ha8 been made in uavol air 

technical tralnfng for counsalinng purpor*r. It is conrldarad that further 

research Into the netura of the � chlavrsnent index will prove to 

extend It8 role in an applied sannre. 

II Prediction of School Achievamant 

The recond approach to job proficiency maarurar i8 the pradiction 

of school achlevamant. This anployed the cl~s8lcrl regrarrion 

end pens/fail frequency proctcdurar to derive prediction8 and sppronimate 

odd8 to succeed various ochools in the Naval Air Technical Training 

Comand. The vrriebler utll:itad arc the � veilebla relaction end treinlng 

grede8. The technical School8 involved were the machaolcal and avionic8 

ratings. Briefly, I would like to prarrst an axample of the correlational 

teblt8 end al80 the parr/fail frequency teblas and their role in tha 

quality control of traiatar. 

. 

, . 

64 

__ ._ _ _ . _ .._ .---- - ___. -__ _-- 

. 

. . 

‘. 

‘ 

___-- -.--I--.---. 

e . 

. 

*. 

a 

i 

a 

. 

/

_ . . . _ .- ._ ..-.-_ _---.---- - -_____ --- ._ _- -.. ..- 

. . 

. 

. 

---- -.-.. __-.-.- _.___.__. . ..- . ..--_.-__-- .._ _ -. _. ._ . ___._ ..-L-.-L ._ . 

A. Correlational Prediction: 

T8bler 6 8nd 7 cent&in the correlatlonr of the trrining grade8 

utillred at two different point8 in the mme school (IW-A). These 

correlatlonr yielded the regrerrion equation8 nccersorg to dorive the 

predicted 8core8 contained in figurer one and tuo. Prediction tableo 

ruch � r.there, rather th8.n regreorion equ8tioa8, greatly arolot the 

training officers to gra.rp the mewing and uoo of predicted scorer. 

It8 application, of cour88, con8f8tr in: 1. Vetezmiakg the a:udent’r 

grades on the vari8bler acro88 the top and left aide of ffgurcr one 

and tvo; end 2. Locating the predicted school final average within 

the prediction tabler. 

.,* 

Table 6 

fntercorrclatlons of Variables Utilized 

, at the End of Aviation Pamiliariratloa School 

m(A) 

(N=1342) 

2 ,’ 3 H u 

BTB 1. .374 .392 177.63 12.32 

APAn 2. .478 76.72 7.03 

AFu(A) 3. 73.80 9.73 

Table 7 

Intercorrelation of Variable8 Utilized 

at the End of Phase 1 of APU(A) School 

APu(A; 

(N-1322) 

2 3 4 H u 

BTB 1. .365 .367 .381 177.79 12.29 

APAn 2. .389 .461 76.86 6.94’ 

Phaee 1 3. ,851 76.27 9.01 

Am(A) 4. 74.09 9.32 

. 

. 

65 

. 

, 

,I 

- .^.._ -.. 

, ; 

I

. 

. 

. . 

. . 

. 

. 

--. _ ._. ._ . ^_ . - --_._--~---.“~-- --..- .- - - _._ - _. .-_. ___ _.-_ 

200 

195 

190 

185 

180 

175 

* 170 

c-4 

' 165 

m 

2 160 

ok 

2 155 

2 ;;: 

150 

$ 145 

140 

135 

130 

125 

120 

R - .530 

~1.23-8.23 

Figure 1 

AFti Prediction Table 

(?P1342) 

AFAU Ffnal Average (X2) 

-. -- 

53 56 59 62 65 68 71 74 77 80 83 86 89 92 95 98 

66 67 69 70 '72 74 75 77 78 80 8? 83 85 86 88 89 

65 66 68 69 71 73 74 76 77 79 81 82 84 85 87 88 

~.... 

64 i65 67 68 ';O 72 73 75 76 78 80 81 83 84 86 88 

.--a4 

63 64 j66 67 69 71 72 74 75 77 79 80 82 83 85 87 

62 63 i65 66 (58 70 71 73 74 76 78 79 81 82 84 86 

m-o-( 

61 62 64 I65 (57 69 76 72 73 75 77 78 80 81 b3 85 

"'Y 

60 61 63 64 ;66 68 69 71 72 74 76 77 79 80 82 84 

8 

59 60 62 63 i65 67 68 70 71 73 75 76 78 79 81 83 

---- 

58 59 61 63 64 I66 67 69 70 72 74 75 77 78 80 82 

t 

57 58 60 62 63 i65 66 68 69 71 73 74 76 77 79 81 

-.-.* 

56 57 59 61 62 64 i65 67 69 70 72 73 75 76 78 80 

-0-q 

55 56 58 60 61 63 64:66 68 69 71 72 74 76 77 79 

: 

54 55 57 59 60 62 63:65 67 68 70 71 73 75 76 78 

!-, 

53 54 56 58 159 61 62 64:66 67 69 70 72 74 75 77 

: 

52 53 55 57 ,58 60 61 63 i-6: 66 68 69 71 73 74 76 

51 52 54 56 .57 59 60 62 64165 67 68 70 72 73 75 

-0. 

50 51 53 55 .56 58 59 61 63 64: 66 67 69 71 72 74 

--- - . 

. 

. . 

Regrerrion Equation: ~,,.196Xl+.53u[2-1.77 

Outtfng Score: 65 (i.e., oddr of 2 to 1 to pare) 

Xl-Basic Teat Battery Composite 

X2-A.PAIf Final Average 

E3=AFU(A) Pine1 Average (predicted) 

. 

, 

66 

*. 

c 

. 

. 

. 

.

__ ._ __ ._ .^ . .._--- -. ____. ..____ ~-__----- ---I -_-- .__.-_ - .___-_ _-.-. - . -. 

. * 

. . 

----- 

:: 

a’ 

_ ; ____ __-_.___ _ _s .,--1 .. ..-Clr- 

_--- --..-- ._.-.__C .____ 

_ --‘-..-- 

98 58 61 63; 66 69 71 74 76 79 81 84 86 89 91 94 96 

95 58 60 63; 65 68 70 73 75 78 00 83 85 88 90 93 96 

92 157 60 62; 65 67 70 72 75 77 80 82 85 87 90 92 95 

89 

86 

83 

80 

% 

a 77 

i 74 

2 71 

lj 68 

i 65 

62 

59 

56 

53 

50 

Figure 2 

MU(A) Prediction Table 

(R-1322) 

Fbue 1 Fin81 Average (Xl) i 

53 56 59 62: 65 68 71 74 77 80 83 86 89 92 95 98 

57 59 62i 64 67 69 72 74 77 79 82 84 87 89 92 94 

(1-m* 

56 58 61 63;66 68 71 74 76 79 81 84 86 89 91 94 

: 

55 58 60 63;65 68 70 73 75 78 80 83 85 88 90 93 

: 

55 57 60 62; 6S 67 69 72 75 77 80 82 85 87 90 92 

: 

54 57 59 62j 64 67 69 72 74 77 79 82 84 87 89 92 

----I 

53 56 58 62 63 ;66 68 71 74 76 79 81 84 86 89 91 

: 

53 55 58 60 63 ;65 68 70 73 75, 78 80 83 85 88 90 

: 

52 55 57 60 62 :65 67 70 72 75 77 80 82 85 87 90 

: 

52 54 57 59 62 :64 67 69 72 74 77 79 82 84 87 89 

c ---I 

51 53 56 58 61 63;66 68 71 74 76 79 81 84 86 89 

: 

50 53 55 58 60 63:65 68 70 73 75 78 80 83 85 88 

: 

50 52 55 57 60 62165 67 70 72 75 77 80 82 85 87 

49 52 54 57 59 62i 64 67 69 72 74 77 79 82 84 87 

---a, 

48 51 53 56 58 61 63 i 66 68 71 74 76 79 81 84 86 

R - .863 Rcgrsrrioa Equation: ~3'.8388x1+.210x2-6.55 

01.234.82 Cutting Score: 64 (i.e.,oddr of 2 to 1 to paw) 

Xl-Fbue 1 Final Average 

~-pAFAH Final Average 

X3=AFCl(A) Final Average (prsdlcted) 

. 

. 

67 

!

--- .._ __. __^ _ __-.-_ -.- -- _ 

. 

- -- . . . .- :.- -. -. . _... .- _ _ __ I - ._. -. 

. 

� 

�� 

� � � 

. 

. 

. . . 

m 

- .---- . .I . ..r _ -. _.-_-1---- _ -- ._ _ 

B. Parr/Pail Oddo of Predfctloa: 

� 

no second mrthod ured to deterarlee a 8tudent’s chaxice8 of 

succeeding in training is bared on the observed Parr/fail frequanctar 

for vulou8 intervals of grader moat correlated with trainlog parformancr. 

Tables 0 and 9 ot tha handout contain the parr/fail frsquancy 

data for 2 variable8 taken at differant point8 fn tr&fntng fn 

the Avionic8 Fuadammtalr School et Hsmphlr. Data In tablo 8 io based 

on an “N” of 1340 rind a correlation of ,481 and data In table 9 on 

8n “W* of 1291 and 8 correlation of 25. The pars/fail frequsncleo 

are trmrlatd into approxiaate odde to succeed that particular 8chool. 

Thr ame data MI translated into two graph8 to facilitate the tratnlng 

officer8 interpretation and application of thaia pars/fail frsquancy 

tables. There graph are contained In figure8 3 and 4 and are relfarplanatory, 

tith the acmplor given vithin ach table. 

The predicted stoma, based on selection and training grader hmvs 

proved to be a valuable index to ultimte training porfomunce. The 

application of there predictions is the

� 

Teble 8 

Thr Probubilitier of Grrduation MU(A) School 

after a Student Carcpleteo APAX School 

(H-1340) 

AFM x A&) WJ(A) Apprax’. Oddr x - 

Pfa.Aver. of Total Graduater Attritqr for 8g8Fnrt Graduating 

9 4 

90-93 

86-89 

82-85 

78-81 

34-77 

70-73 

66-69 

62-65 

58-61 

54-57 

-53 

.4 

2.4 

7.6 

16.3 

20.0 

21.0 

17.4 

9.7 

3.4 

1.2 

.7 

. l 

352 

97 

213 

254 

244 

178 

95 

31 

: 

5 

5 

14 

5375 

35 

1’: 

5 

1 

5 0 

0 

2i 1 

43 2 

18 1 

13 2 

1 

z 2 

2 

i 

z 5 

-7 

0 1 

100 

100 

95 

98 

95 

87 

76 

#73 

69 

ii 

00 - 

llS8 182- 6 1 86 - - 

Table 9 

l’ho Probebilit%e$ of GrAduatfng AKl(A) School 

�� Ltudent Comphter Phare 1 

. (N-1291) 

PhAre 1 % m(A) 

Pfa.Aver of Total Credoate 

94. 

90-93 

86-89 

82-85 

78-81 

74-77 

70973 

66-69 

62-65 

58-61 

W-57 

2: 

9.4 

13.9 

19.7 

15.9 

14.7 

9.0 

6.5 

3.6 

.9 

22 

61 

121 

179 

252 

202 

177 

91 

41 

11 

1158 

� 

� 

, 

69 

Attriter 

� 

- - 

Approx.Odda x 

for Qafn8t Cradu8ting 

22 0 100 

61 0 100 

121 0 108 

179 1 99 

126 99 

67 : 99 

14 

93 

18 : 

78 

bV@Il 49 

1 

24 

- 1 1: - - -8- 9 1 90 

I. 

i 

i 

; I 

i 

. 

i 

I( 

. 

; i- 

t : 

i 

� � 

� 

i

90 

80 

70 

60 

10 

. 

A,ctual da 

I 

I Exawpl~: 64 

2. Interrect probability CUNQ 

1 

I ,3. Interpret odd8 for •UC~BID 

I Grade 64-7 to 3 to pur 

I 

I 

00 7V 80 90 il 

AFAX ?irul Avor*go 

Ptgura 3 

8umaary utilf?y chart of probability date 

coatrlad in T&h 5 

70 

, _ _-_ - .._ .-.-_ . 

. 

Odds 

GVSn 

2 to 3 

3 to 7 

1 to 4 

1 to 9 

. 

. 

.

60 

40 

30 

20 

10 

1. Identify PtlAEr 1 grrda 

Exam$ls: 66 

2. faterroct probability curw L 

3. Interpret oddr for eucce88 

orode 66-3 to 2 to par 

I I 

70 80 90 loo 

Fhsra 1 Final Avrrrgr 

Piyre 4 

Gummy utility chart of probability data 

contained in Table 6 

. . 

. . 

. 

. 

71 

. 

.I 

Oddsl 

9 to 11 

4 to 11 

7 to 3 

3 to 2 

EV&l 

2 to 3 

3 to 7 

1 to L 

1 to 9 

.

. 

, 

. 

. 

rteferencer 

Dubois, P. R., Multiv8rlate correlational analylfe. Ww York Harper & 

Brothers, fmt . 

buboir, P. B. h t4aml~, u. H., Methods of res-h ln tcch~ical training. 

QIR Contr. # ~cear 816(02), NATT, ~av. Ed. ~e.ch. Rapt. 83. April 1961. 

lroehlich. H. P. and ?fayo, C. D., A note cm undar md overe&iewmat. 

perronnal and Cuicirnca JournalI Mrch 1963. 

L0DfJ0, A. A., etioo of rtudant performance Ln five wchurlcrl 

8ch001.. v rm 

Iennsrrce, December 1963. 

lougo, A. A., An appraisal of ratio meores. Bc,earcb Report. Staff, 

CNATECXl=iU, NhS wznphi8, feaa~aea, Much 1964. 

Lwgo, A. A., The pradactioa of rtudrnt parforranca in oaven avionica 

re: oolr . Berearch Rsport. Staff, m .w-i00 

Tameawe, August 1964. 

Uayo, C. D., PAQ Hawal. Renaarch Report. Staff, C2UfeCEmUr tU!I 

Nemphia, TenncBsee, 1957. 

)slyo, G. D. and Xauning, U. Ii., Uotivatlon meaBut-t, Educational, 

end P8ychologfcal Xeaaur-nt, XXI, 7343, 1961. 

‘Lhorndlka, E. L.; Brsgmaa, J.; tilts, J. U. and Woodyard, I!., Adult 

learuing. New York: UacMlLan Co., 1928. 

__--_. .--- . 

72 

.- 

# 

_.__e. c_ .--A____ 

-- .--- 

. . 

. . . . 

., 

, 

, . . 

. *. . 

. 

. 

I 

. 

. 

. 

i 

; 

1. 

.

. 

. 

. 

. 

Parronality Tooting md Job Proficiency 

.?OHH D. RJtAFT 

US Amy Enlirted Bvaluatim Center \, 

Chairman Bridgeo, yellow’-‘Confetee.8, 

It ir a privilege to lead thir dirwrrion on perrmality teeting 

and Job perfomanca, 

A8 a point of departure in exploring waya of iacruring the prediction 

of roldiere’ performance, I vi11 give a goneral di8cuorion of 

the rarurch vhich Rayumd B. Cattell and hi8 u8ociatr8 at the Univorrlty 

of Illiaoir have conducted for mumming motlvetional rtrangth. Aleo, I 

will dircu88 8-e of the u8ea which wo might put Cattell’a re8march. and 

relrtod motivatimal ramarch, to in evaluating tho total percent of 

variance in the Job perbormanc~ of the individual rerviceman. 

Currently, the Army u8es A fob “aa8tary” teat to measure what the 

� oldler “can do.” It *JUU a rating rcrle to measure vhrt the roldier 

“vi11 do.” Unfortunately, even though years of hard lrr&or have barn 

attended by indurtriel prychologirtr in crying to refir.,+ the rating recli 

and to minimire it8 error8. it rtill ir a veak evalu8tfng iustrumnt. 

Individual8 differ in their ability to accur8tely parcaiw the cheractsrirticr 

of otherr. Sane of the factor8 relevant to thla differential 

perception are in the age, 8az, end prrronality of the parcaiver, la 

there charreteri8tic8 of the percoivad, and in the content erer8 in which 

the prrdic t ion8 are madu. 

Thr rating 8~~10’8 %ill do” � reee � re csotivetionol in content. They 

rhw the “drive” or “atntiment” f8ctorr o f t h e roldler’r performance. Hr, 

Shirkey, who rpoke before you A fw chute8 ago, recently did A factor 

analyrir 03 the Army’8 new CER (rating rcale). He found five factor@ 

which can tentatively be celled: Militery Bearing I and II, Xilitaxy 

Laaderrhip Potential, Initiative, end T88k Uotivation. Hr. Shirkey wa8 

8urprired to find that the rupervirory portion of the trot correleted 

rather poorly vith the Leaderrhip Potential Pector of the CER. Although 

it io porrible that there rcrultr ‘Are partielly A condition of the prrtitular 

HOS analytad, it la evident th8t the CER la mesrurlng the mot;-vational 

arue concerned with vhat the coldior “will do.” 

Since mo8t ratiaga are rpurlour and do vary with thr rater, conrider 

how much better it would br if predictive mea8ure8 of tha motivational 

rtrength in there ereee could be nwde in order to give more 

objectivity and exactnerr. An cz8mple of the porrrble work that can 

be done along there line8 18 th8t of the rsrs8rch currently being 

conducted by the US Army Perronnel Rersarch Office on thr valtler a$aoelated 

with military cereer motiv8tion. Tlw rarearcher there heve 

73 

,- 

: ..o

-. _ 

. . - 

: 

; 

-.----.-.--.e. .- - 

Identified the strengeh of six dimensions of motivational values or 

factors which are useful in predicting whether an officer or en epliated 

mm will plan on staying in the military service as a career. These 

factors which are tested by objective type teats ure support, conformity, 

recognition, independence, bcnevalence, snd leadership. ~1~0, they are 

currently conducting extensive research in developing devices which ~111 

predict cthich officers and enlisted uen will be the nest proficient 

dcring their &my careers. 

Cattell and his co-workers hsve spent the last fifteen yesra in 

trying to develop objective measures of motivational strength. They 

studied the amjor evaluative instruments produced by others, their devices, 

and the known general psychological principles in the areas of moeivatlon, 

learning theory, etc. (& example of the kind of principle referred to 

here is that of information. In general, i.e., after we discount the 

Influence of intelligence and general breadth of interest, a person knows 

more about what he is interested in, more about those courses of action 

to which he is comieeed, than he does about what he is not interested 

in.) Cattell developed m-me seventy-five devices of his own and poktulatcd 

8omf2 sixty new principles. Prom this moss of nnterial, he found 

through factor ana?ytic atudies,*seven basic raotivational ccmponents. 

ihree of these seem to correepond in content to the functfonr ascribed by 

Prcud to the Id, Ego, and Superego. The others have been named Physlological 

Interest, Repressed Cmplexes, Impulsivity or Urgency, and 

Persistence. These qualities can be ascribed in varying proportion to 

any drive. Second order facto- analysis of these motivational components 

resulted in two second order factors: (1) Integrated, or that which is 

essentially a conscious and experienced expression, and (2) Unintegrated, 

or that which is essentially unconscious and mainly wishfug and tension. 

The two most important working concepts of Cattell in his theoretical 

treatment of motivation have been termed “erg” and “seneiment,” 

respectively, for the constitutions1 � ud acquired patterns found 

opmationally as factors in dynamic measures by experiments. The term 

“erg” is used instead of drive because ehe latter term drags in all 

manner of clinical and other assumptions about “instincts” and so on, 

whcrers the ergic paeterns are experimentally demonstrable. How?Wr, In 

popular terms an erg is a drive or source of reactive energy directed 

toward a pareicular goal, nuch as fear, mating, assertiveness, aad so on. 

By contrast, a rentiment is an acquired aggregrate of attitudes built up 

by learning and social experience, but also, like an erg, a source of 

motivational energy and interest. Both ergs and senti;?rants, though 

essentially ccmnmn In form, are developed to different degreee in different 

people. Cattell and his co-workers found factor analytic evidence for some 

twenty motivational dimemions in a broad rampling of variables. 

-- _ . . _-._ -.-..___ -_- ___ _ _ 

. 

74 

. . -. 

. 

I 

L 

. 

__. 

_.- -.- 

. 

. 

s

-. _- _. ._. ,. _ -- _ ______-________ ---..--- ..-_--_--___. .--..- ----.. -... -- . 

. * 

. . 

& .’ 

. 

_ _ -----.- --A- __.__.___” ____ _. ,__- _-_..... ..-. -.._ 

S~aa of the major crgic patterns or dynamic factors for which Cattell 

found evidence through his research are: Escape, with associated emotion 

of fear; Hating, with associated emotion of sex; Seif-assertion, with 

associated emotion of pride; Gregariousness, with associated emotion of 

loneliness; Appeal, with assocfated emotion of despair; Exploration, 

with associated emotfon of curiosity; Rest seeking, with associated 

emotion of sleepiness or fatfgue; etc. Some of the major sentiment 

patterns or dyuamlc factors for which he found evidence are: Self; 

Superego; Career (levela of aspiration); Sports and games; Mechanical 

Interest; etc. 

Cattell and his assocfates then developed the experimental HotLvational 

Analysis Test to measure the tension operative in the ergs and 

sentiments that prior to nov was assessed only by rough means. Ten 

dynamic factors were chosen which were felt, as a result of careful correlational 

and factor analytic research, would be of greatest value to 

the test users as being reprerentative, and comprehensive in coverage, 

of adult motivations. 

In this test, Cattell mearured the strength of each of these ten 

dynamic factors by using forced-choice type of attitude-interest questions. 

The particular attitude-interests he used were carefully chosen because 

they were found to be substantially -related to and therefore best suited 

to represent the factors concerned. For example, he found the folloving 

attitudes and their motlvatfonal paradigms for the dynamic factor of 

self-sentiment: , 

-. 

1. Good reputation - - “1 want to maintain a good reputation 

and a camon respect in my commmity.” 

2. Norm1 sex - - - - “1 want a norms1 socially acceptable 

relation to a person of the opposfte 

sex.” 

3. .Look after family - “I want to look after my family so that 

it reaches approved social standards .‘I 

4. Proficient career - “I want to be proficient in my career *” 

5. Control impulses - “I want to keep my impulses under sufficient 

and proper control.” 

6. Self respect - - - “I want never to damage my sense of 

self respect.” 

The motivatioru!l strength of these attitude-interest paradigms for 

each dynamic factor vere measured by using four different forced-choice 

type devices or subtests. These subtests, which are a little unusual 

in nature, are called: (1) Uses; (2) Estimates; (3) Paired Words; and 

(4) Information. They are illustrated in the folloving chart. 

75

� 

�� 

: 

�� 

�☺� 

.- 

.* 

. 

- . -_ _ . __ ma - . -. -. . _. _ __ _ 

_.I , 

.i 

2. 

-: 

_. .’ 

. -. 

. 

� 

� 

�� 

�� 

� � 

� 

� � 

--__~. _-. -. --. ._ 

� 

� 

Subtests (Rxamples~ 

1. “Ends-for-means” (Projection) or “Uses’.’ 

A free afternoon could beet be spent: 

a. enjoying the out-of-doors 

b. earning overtime 

2. “Autism’@ or ~*Esthates” 

How many years does it take to make a secretary efficient? 

a. 2 b. 4 c. 6 . d. 8 

3. “Ready Association” or “Paired words” 

Stamps 

Collect 

Evidence 

4. “Mean-end Knowledge” or “Info~tion” 

Who was the first president of the United States? 

a. Lincoln B. Jefferson C. Washington D. Roosevelt 

The scores from the four su3tests are combined into two groups for 

each factor to give separate strength measures for: (1) the Integrated 

(or conscious) (Information plus Paired Words) component, and, (2) the 

Unintegrated (or unconscious)(Estfmates plus Uaea) component. Thus, the 

test user can either use two scores for each of the ten dynamic dimensions 

in the N. A. T., or Integrated and Unintegrated scores can be added to 

give a single total interest score for each of the ten dynamic dfmenslone. 

The final score on each of the ten dynamic structure factorcr i8 

actually a sum both Over four devices, or methods of measurement as just 

described, and aleo Over distinct attitudes which are known to constitute 

good representatives of the dynamic trait. There are a few other scales 

developed with this test. Perhaps the most interesting one ist the 

Conflict Scale. This scale shows the amount of conflict between the 

Integrated and Unintegrated scales. 

Cattell reported the scores in stens (standard scores) for each of 

these scales. The statistical procedure used in arriving at these scores 

is ipsative in nature. One can learn the total strength of each of these 

drives in the individual being tested and be able to make diagnostic predictions 

from them. 

if -. .-._. ..-.. _~__. _-- ---~--.. -._-_ - 

76 

. . . 

4

. 

. 

. . 

,’ . . 

NW we rhould recognize that what Cattell is measuring here a8 a 

“drive rtrength” is actually mO8t accurately rtferred to ae an ergic 

ten8iOn 1eVel. That i8 to 8ay, it i8 not 8omc donstituticnnl strength 

of that drive, but the end result, in actual motivational 8trcngth of 

four or more influencer. There are: (1) con8titutional endowment, (2) 

early experience8 (imprinting, reprerzioc), (3) current degree of environmental 

8timulation, and (4) current reduction of tcnzion by satisfactions 

and degree8 of recurrent need gratification existing in the prerent 

environment. Theae vary with time. 

The Motivational Analysis Test is etill an experimental tert. 

Hwevtr, a8 more rerearch i8 conducted on it, it8 theoretical predictive 

ability and �� cutal predictive ability a8 derived fran preliminary 

reraarch rerulta will increase. 

In predicting what an adult will do, it 18 evidently quite a8 

important to knw the motivetion available frop these acquired rentimentr 

a8 from hi8 basic ergs. For urimple, the evaluation of a man’8 potential 

In a career will need to include mea8ure8 of the rtrength of hi8 intertrt 

in a career a8 8uch, 8180 of hi8 degree of concern about hi8 general 

reputation and self-concept, and ezp&ially of the level of basic dependability, 

implied by his level of superego development. 

The 8tati8ticalpredictionzwhich rezearch has 80 far obtained 

against life criteria have borne out these poychological interpretations 

of 8core8. Rerearch ha8 shown that dynamic structure trait rtrengthe 

add to the prediction of scholaetic achievement over and above what ir 

predictable from ability and general perronality traits. John Butcher, 

one of Cattell’s a88ociate8, found that other variable8 being controlled. 

those students who are high in ruperego �� high � chiever8, there 

who are low on radi8m are high � chiever8, those that are high on 

arrertivene8s are high � chiever8: but he found no relationrhip with 

8elf rentimcnt. Aleo, he found that the career sentiment mU8t be modifiedI 

for the particular setting. 

Mr. Claude Bridge8 talked with Dr. Arthur Swency a few week8 ago. 

Dr. Swenay, who ie an � 88ociate of Cattell, 8aid that the preliminary 

reoearch rhwr that general motivational factorz could contribute a 

great deal a8 predictor8 of achievement in 8ptCifiC occupational rpecialities. 

He raid that it would probably be neceroary for u8 to develop 

rpecific inrtrumentr for group8 of occupational rpecialitier. (fii8 got8 

in with another te8t that Cattell has developed called the Vocational 

Interrt Heawre, which ir built on exactly the came principle8 a8 the 

Motivation81 Analysis Test only it iz geared toward predicting in 8pecifi.c 

occupational groupinga. The Vocational lnttrt8t Hearure ir being validated 

againzt the Kuder Preference Record.) 

. 

77 

-. - 

- :r --- 

i .

’ ._ ,_ 

1 _ + 

_ _ . . . 

. 

a 

--. 

s 

. 

. .-. . - . _ 

. . . 

/ . . .u 

/ . .- - .- . . ~. :.. __..----.----._- 11.11_--___ _. 

* 

---- - __------. 

Cattell says that when the results of the Motivational hnale 

Test are applied to clinical and industrial work,ritarion re%tiona 

oreequally significant and comprehensible in terns of the concept 

validities of the measures as they are of school achievement. Since 

Cettell is currently publishing the predictive details of most of his 

industrial rcseareh, it will be some months before these results ara 

1 available. . 

I feel that Cattell’s work is vary important to us as he has 

broken through several barriers. First, by carefully reviewing the 

previous motivational materials and those of his own, he wae able to 

corns up with motivational factors which operate more or lesr in all 

adulto. Some of these factors could be very valuable to us in OUT 

efforts to evaluate the total, on-the-job effectfveuers of each soldier. 

For example, by utilizing test results on the relative tension levels 

of ~cnae of these factors, we could after careful etudy of their rigniffcance, 

utilize the test results to identify the cause of a soldier not 

working up to his pest effectiveness. 

Second, Cattell developed test procedurea whereby the variable 

being measured is 80 disguised that the subject usually has no desire 

to distort hir response (e.g., what is obstensibly an ability tert) OT 

else would be incapable of doing so in any ays:ematic way because he 

ie unaware of a “desirable response.” Cattell has gotten away from 

the drawbacks of deliberate faking, of personal illusion, snd of ruperficielity 

of Easurements which have vitiated against mast “projective,” 

“preference ,‘I and “opinionsire” methods employed In interest batteries. 

Since Cattell hes shown that a person’s rrotivational makeup ia baaed 

on both Integrated (conscious) and Unintegrated (unconscious) components, 

we can see that by simply asking a person how he foels about something 

this cannot possibly be sufficient to tell his rrPtivationa1 ten.sion in 

the requested area. 

Third, Cattell utilized new scoring procedures, ipsativa in nature, 

whereby he could learn the total strength of each of these drives in the 

individual and be able to make predictions from them. 

Although Cattell’s Motivational analysis Test ia rtfll a research 

instrument, it is a very major step forward in mJtivationa1 strength 

measurement. This test covers a person’s Interests, drives, and the 

rtrengths of his sentiment and value systems. It increaser the reliability 

of conclusions regarding perromlity dynamics and helps to 

locate areas of fixation and conflict. 

The test concentrates on ten prychologically meaningful unitary 

olotivation systems, established by comprehensive and objective factor 

analytic research. Also, it uses nawly validated objective test devices 

for measuring these interest strengths. 

78 

. . -. _... - _.... -.----.-____ .-.- -- -_--_-- 

. . 

. 

. 

. 

. 

. 

. 

. 

�

_ . - . .._ - -_-..-.-- _-_- ._,__X_____.___r_C_ _ --_ .._.-.. - . 

- _ __ .-_. - _ -~. .__ ..__ .-_ -. - _____. - _- _ _. .- _ _ _ 

. 

. . 

. 

_.- ---....-I- --...- -~..-.‘---.------.-. C....‘~e-.C.---F --*- -..s.*---1m-*r~- --...-,.. -..ea.,I .,,_.__. _.‘_,” ,_ I ,, .. 

By centering on the main mtfvvation nyetenm, ns demonstrated to 

be consistently present in the typlcal indFvFdua1 of our culture, rather 

than on “special purpose,“ arbitary conglomerations, .eubJectivelyconceivad 

interest divisions, or iamginary “drives,” the test design 

recognizes the whole adult. 

It may be possible that we can use the research findings behind 

these tests. We can develop our own motivational fastor tests from 

the great fund of current, veil-founded, and documented research. 

Since we have large populations (serviceman) avaiiable to us, we can 

validate the tests rather easily. ~160, with our quite adequate computer 

machinery to work with, we can easily do the required statistical work. 

The tests which we could develop can be wed in three areas: (1) to 

rupplement or replace parts of our rating wales, (2) to set up a criterion 

for validating and improving ‘*supervisory and cognitive” portions of our 

present teata, and (3) to set UP a criterion for validating our rating 

rcales against. 

. 

Conclusions 

When we look back at Chain.ln’s Bridges’ chart here, and see the 

large percent of variance that is not covered by general *‘achfevewnt” 

type job-knowledge tests and rating scales; and when we see in the 

one case of Mr. Shirkey’s Factor Analysis of, the Army’s new rating scale, 

that our supervisory section of the Army’s Job mastery test dt-.e not load 

highly vith *he factor of Leadership Potential; and when we remember that 

the rating scale is influenced by many factora of the rater, ratee, and 

the content setting, perhaps this ic a major area which we should explore. 

Chairman Bridges and I would like to nou open the session to a general 

discussion of personality testing and Job performance. 

. 

79 

. 

” * 

. 

! 

I 

I 

I

,’ 

�� 

�� 

s 

. --. 

� � � �� 

. * 

-. . 

. . I 

.- .A-..-. ..--7 -------.-_ 

- : - . -. ._^_... 

� �� 

Concluding Cameente 

CLAUDE F’. BRIDGES 

US Army Enlietad Eveluetion Canter 

___ _--._-_ .-.- . -_ *.---. . -- -.-. _. -_---.--. 

-. ._-- 

After tha etimuletfng preperetory presentations of Cenarel Z8i8, 

Ceptain Reyee, end Dr. Edgarton, the question � rieee ‘What have we 

� ccanpliehad in this eympoeiu~?” Cur axpectetion wee not thet WI would 

imeadietaly make eny breakthroughs into the outer specs of tasting. We 

did hope to go beyond mora axchenga of procedures end obtain acme etimu- 

1Ation end nascent ideee having potential for meking inroede into tha 

currently 1nadequetaly maaeureble percentega of job proficiency. I 

believe wa heve � ccanpliehad this purpoea. Three d1ecueeione hevc provided 

us with idaee, possible new � pproechce, or naw epplicetione of 

aetabliehad techniques thet are worthy of further axploretion. 

Dr. Zaccerie mey strike some treditionel1et8 as a bit of 8 maverick 

ia some of his ueegae of terminology, but the points mada in tha paper 

by him end Dr. Kerp marit serious coneidaretion. True psychological 

8nelyece of jobs, epccificnlly for � aeeurament and treining purpoeeel 

and the cfftctiva detenninetion of functions1 job requirements and 

� tenderde should prova to be one of our most fruitful � terting ?ointe. 

As Cronbech (1960) etetee, “The most important requiranent fot valid 

� eec-Jemant le...8 cleer undcretending of the psychologies1 rcc,uiremente 

of the criterion teek.” In this connection, severe1 recent publications 

by Dr. Robert N. Cegnc end Dr. Robert B. Hfller art pregnant with poeeibllitiee. 

The pepare by Hr. Urry end Hr. Shirkay indiceta not only that con- 

� idcrabla improvament of even good taste ten still result from further 

polishing, but � leo point out some new applications of etetFeticel 

� nelyeie end control techniques to npprecieble incraeea in test velidity. 

..- * AS expected, Lieutenant Longo’e peper provides eavarel significant 

contributions. The Nevel Air Technical Treining � teff, under Dr. Meyo, 

, 

for many yeere has bean conducting axteneivc reecerch on tha prediction 

of job proficiency end on � ctuel versus predicted � chievament. This 

pepar werrente some edditionel ccuxaente end I would like to meka sane 

further suggestions on the use of an indax beeed upon the relationship 

of (1) ectuel � chievament, es mteeured by a proficiency test, and (2) 

axpectcd � chitvament, as predicted by en � ppropri&te � ptituda test. 

Lt Longo pointed out, that such 8 “diecrepency score” or, as I prefar 

to cell it, “effcctfveneee indax,” reflects the composite affect of 

several verieblee including both error verienca end � ll verience not 

in cOrnmOn between the two varieblee. Included in these influencing 

vrrieblce era attitudes, work hebite, locelly unique job eituetion 

varieblee, interection effects, end motivetion-in other words, the 

affectiveneee with which the men � ppliee his eptltudee, ekil.8, and 

knowladgce to achievement in 8 specific � ituetion. 

80 

. 

. .

. 

-. . . . -- .--- ..-_ .--- . .--. - -- ..- -- ._ _.. - _.. -._ 

. 

. . 

Two usea are proposed. First, careful study and anAlyses of the 

differences in characteristics of “over-achievers” and “under-achievera,” 

using what Dr. Helvin R. Marks calls “the off-quadrant upproach,” ahould 

help identify variables Chat need to be covered better by the tests. 

Second, is an adaptation of the usual procedure that may add significantly 

to the multiple Correlations. 

In his doctoral dissertation, Reverend Brother James C. Bates, F.S.C. 

Consultant to the Provencal, Christian Brothers of Ireland, Canadian 

Provence, recently analyzed sole of the concomitants of the “learning 

effectiveneso” of 381 male students in a liberal arts college. While 

checking out some of my hypotheses, he produced evidence that an 

appropriately developed effectiveness index combined with an achievement 

test and an aptitude test does yield a significantly greater multiple 

correlation with academic marks than using only the two variables from 

which it was derived. Naturally, the addition to a multiple regresoion 

equation of a variable which ir a linear function of variables already 

included cAnnot increase the multiple correlation coefficient. Even 

using the ratio of the two should Add little. The correlation between 

the two types of indexes USUALLY approaches unity. , 

However, in one part of Brother Bates’s study, the effectiveness 

index formula was developed from a total entering Freshman group (381) 

and then applied to the 274 students who graduated. When combined with 

the other predictor variables, these effectiveness scores increased ..he 

multiple correlation with first year college marks .OS points, which was 

significant at the .OOl level. Furthermore, the inclusion of this total,freshman-group-bnsed 

effectiveness index in the regression equation to 

predict average of allcollegeEarks for four years yielded the remarkable 

increase of .13 in the R. The canposfte of averc.:e high school marks, 

Essential High School Content Battery (the achievement test used in 

developing the effectiveness index) and the American Council Psychologic’s 

Examination (the aptitude test used),yielded an R of .56, which WAS 

raised to .69 by the addition of this type of effectiveness index. 

Incidentally, due to the hfgh reliability of the tests used to develop thlcm, 

these effectiveness indexes had A .93 reliability coefficient. 

This same procedure possibly could be adapted to the military situation. 

It should prove useful and relatively easy to develop in connection with 

successive levels of training. For the Army, an effectiveness index for 

a given F@S based upon the best combination of aptitude measures vereu8 t.hs 

Enlisted M0.S Evaluation Test scores and derived from a total Ho.9 skill 

level population might increase significal:tly the prediction of the future 

proficiency and promotability for soldiers in one pay grade or other 

identifiable auhgroup. Other possibilities for meaningful rubgrapa 

might be devised and checked out. Verification of the applicability to 

OCCupationAl dpecialties of the findings in Brother Bates’s dissertation 

offers the possibility of yielding A simple inexpensive way of significantly 

increasing validity. A true moderator VAriAble seems to be involved here. 

81 

;

. . .-. 

._ . 

. 

-. -. 

Anyone investigating this area further will find several references 

helpful. A publication edited by Drs. Dubois and Hackett includer reports 

by Dr. tlayo and Dr. Froelfsh referred to by Lt Longo. Dr. Harris’s book 

contains an excellent presentation of soma of the complex statistfcal 

problems and theory involved in this area of research. Dr. Thorndike’a 

little monograph examines closely the concepts and methodological problems 

and makes specific suggestions for sound research studies in thia 

rather tricky area. 

The lines of approach reported in Mr. Kraft's paper offer some seminal 

points of departure. III fact,the greatest hope for a major increase in 

percentage of job proficiency measured probably lies in obtaining more 

reliable and precise measures of the aspects of personality related to 

job proficiency--better ways of measuring the factors comprising the 

“will do” of enlisted personnel as identified by psychological job 

analyses and research. There is a good indication of the soundness of 

this conclusion, On 3 August 1964 in a letter on “Revision of Army 

Aptitude Area System”, Headquarters, US Continental Army Command, 

stated as follows: 

. . . Numerous school-conducted studies have shown that age, 

previous education; previous civilian and Army experience \ 

of students frequently correlate hi&her with course performance 

than do ACB test scores. It is recommended that 

appropriate statistical precaution be taken by USABRO to 

Insure that factors other than test results do not contaminate 

the validation. 

“Evidence presented at an April 1964 USCCMRC Basic 

Electronics Conference, plus observations of numerous 

key personnel throughout the USCONARC school system, indicate 

clearly that subjective factors, such as attitude, 

desire to study, perseverance and other non-intellectual 

attributes of USCONARC school students have as much or more 

influence on courses performance of these students than 

“aptitude” as measured by classification test battery such 

as currently employed or contemplated by USAPRO. To the 

extent that inputs to USCOMRC school and training center 

courses are governed by tools thst take into account only 

those cheracterfstics readily measurable and ignore more 

significant - though admittedly subjective - factors, 

there will be less than optimum regulation of manpower 

flow through the Army training system. USAPRO is urged 

to consider the development of measures of attitude, 

motivation and desire to learn as an integral part of its 

program to revise the Army Aptitude Area System.” 

. 

02 

c 

. 

.

. 

. 

. 

In their reply the US Army Personnel Research Office stated: 

..* the background data reported will include not only 

ACB scores but also such factors as age, cducatkon, 

cfvilfan experience, etc. These variables wtLK be included 

in the analysis with the experimental test data. 

“The non-cognitive factors referred to *.. have indeed 

been demonstrated to be important in training performance, 

and in jcb performance as well. In this regard 

it Is worth noting tbct the erperinen:al battery 

contains several non-aptitude type measures, designed 

to evaluate those enduring interests, attitudes, rind 

other personal characteristics which help determine 

what a man - vi11 - - do - rathtr than cnly what he can - do. - 

One goal in the revised ACB is to provide a 

Classification Inventory which vi11 measure personal 

characteristics predictive of performance in occupatfonai 

areas such as electronics, mechanical maintenance, 

clerical-admlnfstrative, and other areas, just as 

the present CI measures personal factors which predicted 

combat performance in the Korean War. 2 

-addition - to these enduring --s--Y characteristics. - - -a__- however * 

there are factorof my.--_ a~otivstion which .e-are prfmnrlQ 

-situatfocel. - The classification br,ttery cannot predfct 

these, but this r Isearch program must take Lnto 

account the effects of such factors on the flndinge. 

It will be very helpful to the USAPRO scfentieta if, 

in conjunction with thfe program, the training installationr 

can contribute inrlghtr end data on such 

factors.” 

In this theoretical sympooio?n M have been able to little more thrrr 

touch on possible approaches to more effectively measuring some of the 

eight types of characteristic8 listed in the tntroduction. Thin afternoon 

a fev more possible approaches wfll be presented, Some should 

prove productive, and all should be stimulating. For example, in some 

occuprtfonal specialties even gross methods of controllfng item readability 

should yield prcfitable improvements especiallqr in tests for occupational 

specialties in which the level of reading ability of incumbents f~ a 

significant contributor to invalid variance of Hcorca. These improvements 

should be even greater for such tests i.f three 8teps can be accomplished. 

First, If a more precise item readability index can be developed instead 

of using adaptations of the gross sampling methods that are adequate for 

masfies of regular prose. Second, if a practical method can be developed. 

for correcting this index for special technical terminology characteristic 

and counon inthe occupational specialty. Third, if some means of determining 

the distribution of reading ability in each specfalty ia 

feasible. 

83 

---- .-. ..- .--_ -_8_-.-.-- ..- - ._ -.-. -..- ._ .__.(._ 

. 

* 

k

:/ 

. - . -_ _* .-- - -. 

-- . . 

. . ’ 

. \ 

./:. 

‘, 

.\ 

-. . 

. . - 

. 

. 

--.---_ -- .-._. ---- I- ._- . _ _ - __--------.._ 

To mention one more forthcoming approach, the “Performance Check Test” 

concept may offer a practical aad logically 8&nd way of effectively 

measuring motor skills in tho8e occ’upational apeclaltiea for vhich such 

measure8 are important and for which the motor skills meet the required 

criteria. 

Haking what Dr. Edgerton identified a8 genuine change8 in testing 

obviouely will require much effort, ingenuity, and a fortuitous concourtit 

of circumstances. . 

T &us confident that at leaet 8ome of the profe88ional 8taff fras 

each of the military services and in other research agencier will 

� ggreeeivaly pursue any inrightr that may be stimulated by these 8ympo8ia. 

We have much at 8takC and many of the 8aae problem8 in coamon a8 indicated 

both by Hr. Johnson’8 mmxnary analysi8 and cauparf8on of the programs 

of each agency and by your answer8 to the program planning qutstionnairer. 

These tend to emphasize the coaxson lnterert in exploring wnyr of obtaining 

marked increase in the percentage of overlap between job factorc and 

the evaluation8 of enlieted ptrronatl. You relectcd a8 tha theme for 

this conference, “Xncrea8ing the tleaauring Efficiency of Evaluation 

Instrumento.” Your rerponse8 to the program planning questionnaire8 indicate 

that, regardlee of the current emphasis in the evaluation program 

of the five services, an inwcdiatt goal of each service 18 the objective 

eValUatlon of as many significant factor8 iu Currtnt job Mattry a8 ie 

practical. For all uae8 of the respective programr., apprairing the 

Current ltvtl of total job proficiency is importsnt. Por promotion8, the 

ft.ture level of job proficiency is the important intermediate criterion. 

Of course, for all of the servicer the ultimte crittr1c.i is vhather or 

not we win any military action and, for the individual enlisted man, is 

hov well he function8 in hi8 a8signtd job during such action. 

� � 

. 

� � 

I . 

- .- _ _ _ - .-..- - ---_.- . _-_-- --- 

. 

. 

. 

I . 

. . 

, . 

. 

� 

. 

. 

. 

. 

. 

. 

b 

.

. 

- _ _ 

- 

. 

- - . .* _ 

. 

. 

. 

-M1_1-_---.--“--- --I 

> 

Reference8 

. 

-. .- 

. 

_... ._ _.- . . ..--.. - - - 

B&e., J. C. An inalyrir --I--. 

of some of tha concomitant8 af learning 

afftctfve~rs. PhD dferertation, St. John’8 Univerrity, New 

York, 1965. 

Cronbach, L . J. Elrentlalr of prpcholoRicr1 testing (2nd ad.). 

Nev York: Harper, 1960, 589. 

Dubofr, P. H. & Hackett, E. V. (edr.) Tb8 &8Ure¶IMIt and evaluation of 

over-and under-•chlsveznant. St. Gi8: Waohingto~nlverrlty, I.%2. 

time, R. H. The � cquirltfon of knowledge. P8whol Rev, 1962, 69, 41,, 

355-36s. 

Sagne, R. H., Major, J. R., Carrttnr, Helen L., & Paradise, N. E. 

Factor8 in acquiring knwltdge of a mathcmotfcal tark. P8ychol 

Monogr, 1962, 76, No. 7 (whole No. 526). 

Gagne, R, H. & Paradire, N. E., Abilititr and lernfng ret8 in knowledge 

8cquirition. Plychol Mono& 1061, 75, No. 14 (Whole No. 518). 

. - . _ _ 

Harrla, C. W. (cd.). Problems & mearurlng Mchrnge. a d i s o n : T h 8 Unlvrrrfty 

of Wirconrin Prt88, 1963. 

Xillsr, R. B. Handbook 0” training & Jraining equipment dt8iRn. 

Wright-Patterron Air Force Base, Ohlo: Wright Air Development Cmnttr. 

Technicel Report, 53-136, 1953. 

HilIar, K. 8. A method for man-machine taok ana?..ysir. wright-Pattsrroo 

_I- 

Air Force ~8~i~Wrlght Air Development Center, Technical 

Report, S3-137, 1953. 

Miller, R. B. Soma Pittrburglh: 

workin& COnCtpt8 of ryrtem8 analymir. 

American Ixtute for Rt8earch. 1954. 

Hilhr, R. B. Tark and part-task trelntrr. Wrfght-Patterron Aft Force 

- - 

Bole, Ohio: Wright Air Development Center. Ttchnlcal Report 60-460, 

ASTIA No. AD 245652, 1960. 

Miller, R. B. 6 Van Cott, H. P. The dtttmfnation of knowledge content 

-me 

for complex mm-machine jabr.-Pittrburgh: Ams&an Inrtttutt for 

Search, 1955. 

. Thorndike, R. L. The concept8 of over-and under-achievement. New York: 

Bureau of Pub-ationr, Tezherr College. Columbia Unlverrlty, 1963. 

85 

, ! 

k; 

. 

. 1 

! 

/ 

,

- 

._ 

. 

. . 

. . 

\ 

.’ 

.- 

.-- 

.:. 

‘. 

- 

-. _- 

. 

.- 

_ 

_.. 

-. 

t-.. .:. 

- 

_. 

. ; 

-. -7 

- .zsI,: 

- : 

---.i 

: -.+, 

.- 

. 

. -w. 

. * 

-. . 

. . m . 

. 

. 

---- -.- ..--.. ._ _. __ -__. _.-. -__-------.-- ._i-_.. - .- --- 

THEOREnGAhfYlsl~OSlUfali-LEROYJOHI1SlOn,CHAI~AR 

USWAVALEXAIBI#IIGGE#TEB 

Relationship Between Raadability and Validity 

of 140s Evaluation Tests 

.TCXW S. BRAND 


Cronbach states‘in his book Essentiala of Psychological Testing: 

“Tests having the same ‘content’ may measure different abilities because 

of variables associated with itam form. Reading ability, for example, 

affects scores on almost all achievement tests. A valid measure of knowledge 

is not obtained if a pearson who knws a fact misses an item about 

it because of verbal difficulties.” Cronbach goes on to ahow the superior 

validity of picture items over verbal items. 

A number of methods have been proposed for measuring the readability 

or reading difficulty level of written matter. The principal ones investigated 

by this offfce are those of Gunning, Flesch, and the Farr-Jenkine- 

Paterson modifi:ation of the Flesch method. These three methods, which 

are essentially variants of the same basic principles, were initially 

applied to several samples of written matter. It was concluded that the 

results obtained from the three methods were essentially the same. The 

Gunning method, however, has the advantage of being simpler, and was 

therefore adopted as a method for the measurement of readability. 

This psychologist is only aware of two papere dealing with the 

readability of tests. Neither of these present methods for determining 

the readability of individual test itema. 

Long words and long sentences have baen shown to be the principle 

determiners of readability. The three methods mentioned above all use 

average sentence length as one determiner of reading difficulty. In 

determining average sentence length, independent clauses are counted 

as sentencea. 

Flesch counts all syllables to determine the average word length. 

Farr-Jenkins-Patterson suggest that the same information may be obtained 

more easily by counting one-syllable words. Gunning counts words vith 

three or more syllables to determine percentage of “hard” words. 

--_-_. -.. .._ _ - - - 

. 

--s-- -- 

. . . . : 

86 

. 

. 

_--___-. ..-._. -- 

’

-. 

. :a. 

.\ 

-. 

-I- 

.’ 

-.., 7 

-a! 

i 

-.- 

.- /- 

-, *- 

.‘. 

-.. 

,’ 

- 

..- ._ 

.- 

. . 

/’ 

-: ,- 

-y _ 

-- 

$11 

- .- 

- 

- - 

--___ 

.-_ 

. 

. 

In counting three syllable wOrd6, Gunning makea the following 

exccptlona: 

verb forms that are made three syllable8 by the addition 

of -CI pr med. For example: reducte, invested 

proper nouns or word8 that are capitalized 

combinations of simple words, like bookkeeper 

t&nning’S Fog Count, as he Call8 It, i8 found by adding average 

sentence length and percentage of herd words, end multiplying the resulting 

am by .4. The Fog Count is a grade equivalent. Twelve equals high school 

graduate, 16 equals college graduate, etc. I have celled the acore 

resulting fran the application of Gunning’8 method a “Readability Index”‘, 

or RI. 

The fOrUtUla8 develdped by Gunning end Flesch vere validated against 

the McCall-Crabbs retding tests which were etandardized against ext,enslve 

student population8. This critcriOn is admittedly not a good one for 

adult reading materiels, but it vaa the only one available for that 

purpose 0 

Application of Gunn’ng’s Fog Index to MOS Evaluation Teat Ire.2 

Plesch end Gunning recommend taking loo-word samples of long articles 

to determine reading difficulty. Counting only one alternative to avoid 

excts8ive repetition, the average test iten has about 20 words. This 01 

admittedly a small sample. However, much Useful information may be gained 

if a readability index can be determined for the individual item. An 

echlevement-type item will contain one or more complete sentences to 

determine average sentence length, end the number of 3-eylleble word8 

in en item can be counted to determine percentage of “hard” words. 

In counting 3-syllable worde, decieione mU8t be made about counting 

number8 and eerles of lettera, or numbers and letters, which occur feirlly 

frequently in MOS evaluation tests. In this respect, it is believed 

that Gunning’8 ryrtem is more adapteblc to determining the readability 

. of HCS evaluation test items then that of Flesch. 

Determination of Validity (ric) 

A criterion of job performance consisting of 3 peer ratings per 

subject on an overall performance scale hae been obtained by this Center 

for samples of EM in selected NOS. These ratings vere obtained in field 

trip8 to selected military installations. Item validity (ric) was 

obtained by the point bi8eri81, or Pearson correlation between item and 

criterion. 

. , 

. 

87 

. 

.

“. 

.I 

-..- 

--;- 

_, ‘5 

.’ , 

I 

. --_-- 

--- - -*. -- 

. . - 

^ 

. 

. 

a 

. 

- - 

._ _. .,-. _ - _ _..- .___ . _.... . .._ 

Analysis 

A bivariate distribution ~58 prepared betwoen readability level, 

shown on the horizontal axis in intervalt of 3, and ric, shown on the 

vertical axis, in intervals of .lO. 

Correlational analysis to detect nonlinear r&%tfanshipr of one 

sample produced nonsignificant results. 

The ric was cmputed for each column of the scatterplot. E5ch 

column represent5 different levels of reEdability ic intervals of 2, 

e.g., 2-3, 4-S, etc., through 22-23, 24-25, etc. 

The items of the teat were divided in several way8 on the basis of 

readability level. The-tic was computed for items above a certain readability 

level, and the ric was computed for iteXM below this level. If 

the former is lower than the latter the sigcificance of the difference 

was tested. A one-tailed te8t was used. Degrees of freedom for each 

ric vas taken aa the number of i:ezns included in computing the ric times 

N-3 of the validation sample. In other word5, each item contributes 

N-3 degree5 of freedom. . 

The critixal readability level WBR tnken oa that diviafon point which 

yields the largest s score or critical ratio betaeen the ricls obtained 

for the two parts of the test. 

The yic’s so obtained were also tested for significant deviation 

from zero. 

To dete,rcsults have been obtained in four MOS. Results for two 

of these are shown in the following tables. These two MOS include 

some 40,000 EM. 

. 

88 

_ ._ ..----.-.- -... .-. --__- -- 

. 

. 

.

- .-_ 

.! 

. 

. 

. 

__. _. . ._ 

1 

_____.___ .-.-_.. . - . ..--- --.---. -. -- --- -.- 

. . 

--_ .-,,__ -____ -2 _._. _,_ . I ,-. . -. 

-._- --_a _-._____ _________, __-_- -..____._ .--a. .-._ . --.- - 

� 

. 

. 

MOS 94l.i, cook6 . 

Inrpection of the colum Gic’6 shown in Table 1 suggert that item6 

above a readability level of 9th grade fail to predict the criterion. 

The ric for 53 lteme with RI greater than 9th grade lo. .006, which ie 

not significantly different from zero. The ?i, for the 44 items with 

RI of 9 or lesr is .OSl. Thi6 Tic ir 6igdfiC6ntt,~ different from 

�� 

The difference between the two ‘Tic@6 is significant at the .Ol 

level. 

It is therefore concluded that the 53 item6 with RI greater than 

9 contribute little to the volidity of this MOS Evaluation Teet. 

The obtained validity of the test was .lO, Whatever validity the 

test ha8 appear6 to cane primarily from itUiI6 with lower level6 of 

readability. 

MOS 111.6, Light Weapon6 Infant-n 

Result6 for the LOO technical item6 are rhown at the left of Teblo 

If, and reault6 for the 25 supervisory item6 are ahown on the right. 

In the technical test,‘18 ittD6 with RI of 20 or higher have an 

a&rage validity of 0.014. The validity of the te6t, therefore,-which 

is .25, derive6 principally from item6 with a RI oelow 20. The ric 

for it-6 with RI below 20 is .065, which is significantly different 

from zero. The difference between the two riC’6 is significant at 

the .05 level. 

The supervieory tert can be divided st 14 itIm6 with RI up to 13, 

and 11 items with RI of 14 or higher. The Fi,‘s are respectively .086 

and .035. The former ie oignificantly different frcm zero and the 

latter i6 not. The difference between them ie t!ot rtati6tically significant. 

If the trend6 reflected in the rerults obtained to date are born 

out by additional MS 6emple6, it is believed that a strong relationship 

will have been demonstrated between the readability of teet items and 

item-criterion correlations. Step6 can then be taken to write test 

item6 at appropriate levels of readability. The result should be a 

significant increase in the power of items to predict job proficiency 

criteria. 

89 

. 

,* 

. 

. 

. 

/ 

,. 

. 

I I

.=- 

. -’ 

_ 

,“ . 

. 

- 

. . . 

_- - . -. 

, 

-- . 

. . e 

. _ . ._~ -------.-..- 

- .A: .__ ._ ._ 

_ 

. 

L 

TABLE I 

MOS 94tl COOK N= 129 (3peer ratings) Apt. Area GT 

. . 

. 

Va I.= . IO 

24 

21 

Rt 

It3 

6 

RIla $ f z 

+tc .04 .06 .05 .05 

QC .05l 

tf 3.92 

p< .OOOl 

Omltted “M- 1337” 

beg* 8 

02 DO .02 DO -.02 -.06 .Ol 

.006 2 =2.50 

p< .OI 

. 

. 

-

s-t g 

CD L-9 "0 

aI 

- - -I -, 6-8 g 

4 v 

N a 

C!li! if 1: I 

1 ! p-4. 1 ! q 1 p-y 

! -1 -1 -1 ‘? 

I 61-W 

6 -- 

Ci:!! :I. 

I I TI I 

12-02 0, = 

=j I--- = 12-02 is 

. 

16 

, 

. 

___ I.. ‘s 

.-...C -q.-~...v..-- --.-___ - __ _ . . ^ .- . ..__._.-_-- ---.---.------ _ _ _.^__ 

_ & ._I. ., 

. . .._ _-. .-.. 

. 

. 

_ .-- -- ___-.. . _.._ - .-- .--. 

. .

Performance Check Tests 

CLAUDE P. BRIDGES 


Background 

For some occupational specialties, one way of making significant 

inroads into the variance in job proficiency currently unmeasured would 

be to measure more directly and effectively proficiencies involving motor 

sKills. 

In some occupational specialties, physical manipulation skills (nctor 

skills or the product5 thereof) are crucial and important determinators 

of significant portions of job proficieacy levels. The development of 

adaptations of written test techniques uhic!r will adequately measure such 

manual skills, ,at least indirectly, msy be possible ultimately in some of 

these jobs. The correlation between a man’s knowledge and his use of the 

knowledge on the job often tends to be quite high. However, there are 

several occupational specialties in which direct measures of such physical 

manipulation skills still will be necessary for good coverage of the job. 

In spite of the difficulties inherent in the world-wide use of performance 

tests, it has been possible, in the US Army Enlisted Hilitary 

Oc-:upational Specialties (HOS) in which such skills are most basic to 

differences in job profit-ency, to develop performance tests which can 

be administered world-wide under standard conditions and the results 

evaluated in an acceptably standard manner, However, such tests are quite 

expensive to develop, administer, and evaluate--often prohibitively so. 

As a result, it is quite important that every effort be made to explore 

other possibilities for directly evaluating motor skills and their 

physical products and to include such new measurement techniques among 

the evaluation instruments. 

This paper presents one such technique and the procedure for developing 

the required instrument. 0rl;inally I called it a Performance Check List. 

However , thie term is already used commonly for a significantly different 

type of instrument. In order to emphasize one of its two most important 

characteristic5 I now refer to it a5 a “Performance Check Test,” or in 

short a “PCT . ” On the basis of extensive experience in this measurement 

area, analysis of the pertinent reseqrch literature and logical considerations, 

the technique proposed herein can be expected to function effectively 

in the occupational specialties for vhich it is most appropriate, i.e. in 

the jobs which meet all the specified criteria. 

92 

� 

. 

. 

I

i’ 

I 

. 

. 

When developed with ing;t,2uity, we may be surprised at some of the 

areas in which PCT’s would be appropriate. For example, Dr. Owen Waveless, 

an international specialist in linguistics, assures me that the development 

of an Instrument along these lines for interpreters should be ponsible. 

He said that such an instrument could be used effectively by an observer 

NOT skilled In a given language to evaluate an axaminee’a ~aneuaw’imJficiency. 

Even though it would NOT be a canpletely adequate aubstltute 

for more precise measurlas yielded by a good performance test, it would 

be much better than a traditional rating scale alone. 

. WHAT IS A PERFORIWXE CHECK TEST? 

- - 

. 

Y 

The proposed instrument is a true test developed In much the same wa,y 

aa a good written test, but the items are answered about--not by--the examinee 

and each item describes one specific mnnipulation, skill, or 

product that is marked by the observer, or supervisor, as being observed, 

or not -brerved in the examinee’s performance of the specified tasks of 

his job. It differs from the usual performance check list in that each 

item has been selected on the basis of experimental analysis (using 

standard item analysis techniques) frcxn a much larger list of specific 

proficiencies that first, can be readily and directly observed or 

inferred, second, can be so defined that any qualified observer would 

consistently state that the soldier observed either did or did not odequcll;ely 

perform the defined manipulation,and third, which distinguish between 

characteristics of various levels of proficiency. In some instances the ’ 

manipulation described by the item might be a specific task. H,-wever, 

most items wou!d be limited to distinct individual skills or manipulation5 

involved in completing such tasks. For example, one item in a performanoe 

check test for a bandsman might be,“Consistently produces high C on his 

inetrument when called upon to do so.” A bandmaster should be able to 

answer this question about every one of his bandsmen, even without any 

special observation of them. 

Different items should describe consistently observable levels of 

skills possessed by examineee with different level5 of job proficiency, 

Some items should describe skills whfch are possessed only by the most 

proficient soldier in the MOS; Borne should cover skill8 which are 

porseoeed by the highly proficient but not by a man with average prof 

iciency; come should cover skills possessed by the average man but not 

poclstseed by those low in proficiency; a few should cover skill8 that a 

man low in proficiency can do but that a novice with some familiarity with 

the ffeld can not do at the level described by the item. Naturally, 

neither items covering things which practically everyone in the job can 

do, nor items which no one can do should be included for measurement purposes, 

. 

93 

--- 

- 

, 

, . ! 

1 . 

I 

i 

6 

,

, 

s 

. 

. ---_. 

. . . 

. 

. 

- -.--. .-. ._- .*__ ____-.._ -.-..-- -- --.. _.- 

“-. _-- ._-__.- .- --- 

Ideally, the items should NOT require the observer to evaluate the 

level at which the individual performs the skill covered by the item. 

In ‘other words, it should not be necessary to make qualitative judgment 

or inferences in deciding uhether or not a given exadntc performs the 

skill in the manner defined by the item. 

In btmary, the most important distinguishing characteristics of a 

Performance Check Test are that (1) the performances can readily be observed, 

(2) they can be consistently judged as being present to the degree described, 

or absent, and (3) they are selected by appropriate item analysis and 

validation techniques. 

The Performance Check Test might be considered as consisting of a 

special type of rating scales. Honever, the PCT is not a usual type of 

rating scale. It is a li.st of things which an observer can easily determine 

that a man can or cannot do. These should be weighted on the 

basis of experimental findings which will indicate each Item’s Importance 

as a component of the exsminee’s effectiveness cr job proficiency. The 

quantitative expression of the level of skill in the manual manipulative 

aspects of the job vould be obtained mensly by adding up the vei&tc 

assigned to the items checked for the individual evaluated on the 

Performance Check Test. A total score reliably and validly reflecting 

the exsminee’s skill is obtained. It covers discriminating motor aspects 

, of the job. When appropriately weighted and combined with the written 

test and other job,proficiency evaluations, a significant increase and 

in some jobs a considerable increase in validfty should result. Thus 

additional inroads into currently unmeasured variance in job proficiency 

should result. 

SlJHKARY OF DEVELOPMENTAL PROCEDURE 

1. ldenttfy the jobs for which current evaluation instruments are 

least valid in which motor skills or their product are important 

discriminators of d!.fferences in levels of job amstery, and in which 

the tasks involved most adequately meet the criteria, (See Annex 1). 

2. Analyze all of the pertinent information available or readily 

attainable and prepare a preliminary list of items to serve as an 

illustration for the subject-matter experts engaged in developing a long 

experimental list of potential PCT items. The “critical incident” approach 

developed by Dr. John C. Flanagan would be quite helpful. The “essay 

approach” proposed by Hr. Roberts, an USARES Supervisory Research Psychologi at, 

and his staff should be another helpful source of items. (See Annex 2 for 

their materials.) 

3. Convey these examples together with guidelines and appropriate 

accessory materials to the appropriate experts on the occupational 

specialty involved in the organizations at which such experimental items 

94 

. 

._

. 

. 

are to be developed. Discus6 with the subject-matter experts the posl;ible 

variation8 of the procedure 80, that they have clearly in mind what is 

deefred and how to provide it. (See Annex 3 for a list of criteria for 

m iteIII8.) 

4. Develop an experimental form of the PCT, coixsidering the "item 

criteria” lieted in Annex 3 and including all of the Items which are judged 

to be promising. 

5. Administer this experfmental form to a group of men working in 

the occupational specialty, a8 close as fe feasible to the date on which 

the specialty is being evaluated by the regular instruments. In the 

rample used, all levels of proficiency should be reprerented. At least 

two, and preferably three or more-observers of each men should apply the 

experimental PCT. At the 8ame time the be8t posrible externel criterion1 

data should be obtained, 

6. Analyze the item8 and the possible ways of scoring each against 

the total score on the experimental form of the PCT, against the written 

test, and against the external criterion. Obtain correlation8 also with 

all of the regular evaluation instruments. The consistency with which 

each ttw can be used will be analyzed. Techniques similar to thorc, used1 

fa refining biographical information blank8 and to regular item analyser 

should be applied. The desirability of assigning other than unitary 

weights, to the individu81 item8 In the check teat, should be determined 

empirically. The most valid and relipble scoring points for each item 

should lfkewise be determined empirically. 

7. Coneiderfng all the available data and the pertinent general 

measurement criteria, develop the revised form of the Performance Check 

Teat. If extensive modification of a eignlficant proportion of the experimental 

items has not been neceeeary, a useful estimate of the reliability 

of the total score on the revised Performance Check Test @CT) 

can be obtained at this time. The data also can be analyzed in relation 

to minimum acceptable proficiency 8nd promotion. Optimal Weight6 of all 

the evalucltion instruments naturally should be determined for the experimental 

tryout population snd appropriate personnel action8 determined 

for critical “cut score8.” 

Por jobs in which it ir pO8Sible to obtain an adequate number of PCT 

items, each of which can be consistently observed in a performance as 

meeting or not meeting the deecribed limits--e.g. for which the observer 

can reliably say,“This man can, (0. cannot) do this thing as specified,” 

and when this crucial statistical .-nalysis la made, the effectivenees 

of this Performance Check Test technique should be assured, The percentage 

95 

.

. _ 

. 

‘. -7. . . . 

__ ‘. � �� 

�� 

� � 

: 

� 

� � 

� 

� 

� 

� 

. 

__ . -.-_- _... . . . . ..- : 

_..-. -^-. -_ -. -... - 

of the veriancc in job proficiency mearured by the battery of evaluation 

inrtrumanf~ rhould be appreciably increased in any job in which motor 

� kille or their direct product are rignlficant detemlnerm of differencea 

in job/master, and thum in job proficiency. 

Reference 

Adkinr, Dorothy C., Primoff, E. S., McAdoo, H. L., Bridges, C. P., & 

Form, B. Conrtructioa and analysir of achievement teata. 

Warhlngton, D. C.: ;IS Go~nment Prfncng Officr, 1947, 211-265. 

96 

_. _ _-_ _.. -_ .- ---.- _-__. .._ --- --.- -- 

. . 

* 

. 

: . 

. . 

. 

. 

, 

. . 

: 

* ‘% . - 

. 

. 

. 

.

--- 

. 

. 

. 

. 

Aritm 1 

CRTTRRU FOR IDXNTIFYINC APPROPRIATE OCCUPATIONAL SPECIALTIES 

The general cha=actcristics of occupational apacialties in vhich 

performance check test6 might be expected to make a significant contrlbutfon 

tovard Increased validity are mmarizcd belov: 

a. Performance Check Tests should be considered for occupational 

specialties in which special motor skills or the direct product of such 

skills are critically involved. These rkille must NOT be comton to most 

men. Protzlnent among such occupational specialtie8, probably should be 

those having as their principal duty the operation, and perhaps in some 

instancea the maintenance of equipment. 

b. The motor okills ig the job must be consistently end readily 

obrervable to a. clearly definable level by an adequately informed evaluator. 

c. The rotor skill8 in the job must be crucial determinators of 

differences in level8 of proficiency, I.e., the physical skills must play 

an important role in discriminating betveen relative level8 of proficiency. 

Frequently MOST of the duties of such HOS vi11 entail special physical 

okills employed In the manipulation, use, and/or adjustment of toolfi, 

equipment, mechanism end/or meter:als. Typical are the equi-ent 

operator’e, .drivers, mechanic8 . machinfs t8, bandamen, fmne maintenance 

technicians, and 8ome automatLc data processing equipment technicians.

- -- 

..a-- 

\ 

----+. 

- 

- i_ 

7 

. . . 

- -7 

.) 

. 

-‘L. ..I 

yi- 

-+ 

._ ..; 

. .._. 

.- 

‘.. 

. . . + 

. ... ._ 

- a. 

- s. 

, 

-7 ‘_.. 

A 

.- 

,. 

-- 

_. .- 

.- 

.-< 

.?-F- 

--CL 

_ .:-’ 

- : 

.- ., 

_. 

.- _-. 

- _ 

. 

“. 

:m.* 

,- < 

‘. 

$ 

* 

. 

ANNEX2 

EXPIANATION 

Preeently, bandsman performance tests are being administered by 

having bandsmen tape record certain standard musical passages. The tape 

recordings are, then evaluated by specially selected and trained audition 

boards. This method of performance testing is time consuming, expencive, 

and pose8 many problem8 from an administration and scoring viewpoint. 

Any difference8 in scoring from one audition board to another decreases 

the reliability of the acore proportionally. 

The US Army Enlirted Evaluation Center ia conducting a study 

concerned with the development of a bandsman performance evaluation scale(s) 

which can be completed while the supervisor observes the bandsman performing 

hi8 duties fran day to day and/or by having the individual 

bandsman play special music passages or perform special duties. 

To construct bandsman performance evaluation scales, it is necessary 

to obtain information from as many sources as possible concerning the 

fmportant specific performance behaviors that are demonstrated by both 

good and poor bandsmen in the performance of their required duties. XndiVidual8 

holding a particular MOS are among the beot equipped to know 

what are the important types of behavior. One way of collecting this 

information from the individua’l bandsman ie to have him write a short 

essay about the individual he consider8 to be the best performer in his 

ttOS and to urite a short eosay about the individual he consider8 to be 

the poorest performer in his MOS. 

You are requested to write essays on the attached sheets in accordance 

with the brief directions given at the top of the respective sheets. One 

sheet la to describe the best performer you know; the other sheet is to 

describe the poorest performer you know. 

98 

. . 

_-. . _ . . 

__ __- _.-_ . -.__-- -. 

c 

, 

. 

- 

. 

.,‘! 

. 

. 

.

Page 1 

GR4ia -- 

HOS -- 

Length of 

Time XOS Held 

Total Years Hueic 

Training 

-- 

Total Year8 Experience 

8a a Uuafcian -- 

On thi8 page write a rhort essay describing the on-the-job performance 

behavior ,of the bent bandsman you know in your MOS. Emphasize the thing8 

he can do bttterxn mo8t in the HOS or can do that others crnnot. It is 

important that you write your array from your own viewpoint. D o n o t 

consult othero. to determine what you should write. The ‘combining7 

individual viewpoint8 will be much more v8luablt than having group 

opinion8 prtrtnttd a8 individual viewpoints. Write on the back of the 

page if nece88ary or use an cdditional oheet of paper. 

99

. . 

- _ ..‘.,. _.I_ -- ..- . ___-- - - - - .---TV-.---. - 

_ _______.-.... ?: _____.-____ __ ,-,. - ..,-.L4.. .--a I.---- 

. 

: : 

. ---, 

. . . . - -- -- - 

..: ._ 

.< ‘* 

. w 

. 

. . _i~ 

. 

P&go 2 

‘. 

MOS 

Length of 

Time KOS Held 

On thir page write a rhort array describing the on-the-job performance 

behavior of the poorest Emphasixe bandsman you know in your MOS. 

thu thfngr he cannot do at all or as well aa the typical bandsman in the 

was. It is important that you write your era&y frw your own viewpoint. 

Do not conrult others to determine what you should write. The combining 

of Fndividual vlevpofntcl will be rrmch more valuable than having group 

opinionr pre’sentcd as individual vievpoints. Write on the back of the 

page if necerrary or use an addittonal sheet of paper. 

. 

. 

. 

.

. 

. 

ANNEX 3 

CRITERIA FCR PERFORMANCE CHECK TEST ITEHS 

1. Items should requfre as little judgment as to quality as possible. 

Ideally, each item would be evaluated as being “possessed” or “not possessed” 

by examinee. 

2. Item6 should emphasize licrftical’V motor skills that discriminate 

between various levels of performance. 

3. Readily observable differences should exist in the motor tasks, 

(or portion thereof, described by the item. 

4. The performance can be evaluated objectively in that it is 

directly measurable (i.e. can be evaluated as to percent accuracy of 

performance , count can be made of specific number of times something was 

done, is.performed at the defined level, etc., rather than having to 

evaluate it in a highly subjective manner where the standard6 will vary 

greatly from one observer to another.) 

5. Items selected for objective evaluation should be a representative 

trampling or cross section of the overall performance in order that the 

evaluation will not be one-sided or unfair to 6ome. 

6. Items of performance selected should be those that can be 

Ipresented in a uniform manner by different observers at different locations. 

7. Items should be those which when being used will have a high 

degree of agreement among observers. 

8. Items requiring a special performance of standard task may be used 

when required, but otherwise should be avoided to simplify completion of 

the PCT. 

9. Item6 which can be evaluated better and more efficiently by paper 

and pencil testing should not be included in this list. No item should 

have a lower correlation withtotal PCT score than with written test score. 

10. Items selected should be considered in terms of use for minimum 

qualification, proficiency, and pranotion score determination. This implies 

that items may need to be restricted to activities commonly per- 

IEormed by most or all. 

11. PCT items must conform with usual measurement principles and 

item and test criteria. 

. 

. 

, 

. 

: 

i

*. . 

_- 

*.. 

-- 

_- 

. . . 

: 

1. 

.- 

\ . 

L- 

: 

_ :a.-. 

: 

-_ 

- . 

-_ 

\’ 

._ 

-0. 

-. 

,. 

---. 

>. 

*\. 

-. 

T. 

,-- : 

&ntroduction 

Reliability of Checking Computer 

Produced StatfstfcS 

CASMER ‘S. WINIEWICZ 

US Naval Examining Center 

With the advent of computer processing, it became porrible to reduce 

large amounts of raw data into more easily handled and more meaningful 

forms. Previously, where estimates of the statistical characteristics 

of examination6 , such as Iawshe’s D or Flanagan’s r based on eithtr upper 

and lover 25 or 27% of a sample wtrt used, it became possible to awitch 

to the original point-biserial correlation concept bartd cm the total 

samplt. Whtre computer configurations are extended to include a tape 

system, it becomes possible to expand the samplt to include the entire 

group of examineeo taking a particular examination. 

Computer processing, when applied to item analysir, results in 

dtcrtated processing time and earlier availability of results. Itan 

correlations and difficulty indices based on the total sample result6 

in more precise measurement, while the expansion of the sample size to 

include the entire population results in even greater stability of 

measurement. 

Howtver, computer procesring has not been an L;unixtd blessing. With 

the onset of any new system, new problems are generated. One of these 

problems concerns the introduction of discrepancies into item analyoir 

results. 

The carue of these discrepancies can be divided into three general 

. areas � The first arta could be labeled Program Error. Although new 

computer programs are &toted on actual samples of the data they will 

process, there is a limit to the exhaustiveness of these ttst8. In 

areas of measurement where samples vary in size and characteristics, 

unforseen program difficulties may arise. 

The stcond arta might be called Machine Error. All machines, even 

those components with few or no moving parts, are subject to failure 

when used over a period of time. In the read and punch components of 

the computtr configuration, burned out circuits may occur, causing a 

failure to read or punch out particular card columns. Such discrepancies 

are generally consistent in afftcting a particular part of the itan 

a.nalyais processed before the failure is discovered. Ir the computtr, 

components cooling fans my fall causing over-heating which reeults in 

the variable .loea of data. This type of error is the most difficult 

to spot by visual inspection because it is variable. 

_._ . - -- 

. 

. . 

. 

3 

102 

*_ 

. 

- 

-t 

.

. . 

‘- 

-.- 

. 

.I : 

. 

-.. 

(. .- 

,.’ 

.‘;. . . . 

: 

.._ 

. . 

../ 

./’ 

: 

. 

-. 

:’ 

. . 

. ‘,\ 

.’ ‘\ 

. 

, 

. 

. 

. 

._ 

�� ⌧ �� 

�� 

- ~- . ---. .*._ -..-___ _,- .__^.. ____. 

. 

. 

�� 

� � 

�� 

. 

.I..-.-..--.---. .._... _ __; _.., ._ ._ 

c 

�� 

The third general area fall8 into the Human Error category. ~x~plea of 

thir type of error will vary from the use of the wrong cnsver key (eaey 

to spot vicually) to the inclusion in the ramplt of answer cards from 

another examination, or lack of coordination portafning to different 

elements of the program. 

In short, diacrcpancier do occur in machine-produced item analysis. 

and it is generally agreed that human inspection of item analysis rt6UltiB 

ir ncca88ary. 

Hethodr 

At this point it might be useful to conaider scme methods upon which 

vieual fnrpection of machine itear � naly8i8 re8ulte may be baaed. 

The firrrt method considered might be called the u8e of an outside 

criteria. One such outside criterion vould concern the Item difficulty 

index or item p-value. The p-value criterion 8tAte8 that a p-value may 

not be less than 0.00, nor greater than +l.OO. That In, not leaa than 

zero percent of the rample may an8wer an item correctly, nor more than 

100 percent. Another outside criterion would 8tate that the item-teat 

correlation or r-value (vhich 18 generalized a8 the discrimination index) 

may not be lese than -1.00 nor more than +l.OO without further invertigat 

ion. Another method of check:ng itan analyrir might be called the 

intra-item conristency method. If the item r and p values are bared on 

the total sample, and if the correct and incorrect item alternative8 

r and y ertimatar are baaed on a high and low group (upper and lower 

27% for fnrtrncc) then the correct alternative is being mearur.-d indcpendently 

on two overlapping groups to yield similar although not identical 

rerults. Empirical studier based on a large number of item8 will yeild 

probability table8 for variourr degree8 of diacrepancice betvecn there 

tvo ret8 of r and p valuer. 

A more involved method of checking item analysis might be termed 

the Item history method. This method is rertricted to pre-tested or 

control items, and COn8it3t8 of observing change8 in the p and r value8 

of the right and wrong alternative8 of an item from one examination 

period to another. Although the p-value of an item is canputed independently 

of other items, it8 value may change due to change8 in the examiaee 

population or varying degrees of item coatpromise or obsolescence. The 

r-value vould also be expected to have a history showing a degree of 

fluctuation which would be further affected by its dependence on the 

action of other itanu in the examination. 

Another method of checking item analysis data ir based on trends in 

the item results. When the p and r value8 of the examination items arc 

viewed am a whole, trendo may become apparent. For instance, if one or 

more clusters of negatfve r-values are obrerved wfthout cmpcnsatfng 

group8 of high r-values and the rav score standard deviation io adequate 

on the ballsi of experience or a rtatistical model, these trend8 may 

indicate error8 in the item analysis resultr. 

. . 

. 

1 0 3 

I 

� � 

� � 

f

c 

.” 

-. f 

. . . I... 

..’ 

A varietioa of this method would consist of averaging the r-velucr 

� ppropriately end compering this average with the raw score standard 

deviation. 

These methods have two important chorattarirtice in couxon. Fir&, 

they lack prtciaioa. The Outoide Critaria, Intra-Itcln Conoiatency, Itan 

Hirtory, and Trend Method of checking iton anaiyeie irrvolve limits or 

ranges vithin which machine produced it- enalynie data CM vary for rea- 

OORO other than error. Although probability estimate6 may be determined 

for individual itgls variations within a “range” oc the basis of probrbilfty 

tables, the problem would remain as to tha range of probability estimatea 

which are apt to lndicata error in the iten enslyair data. 

The second camnon charocterietic of imporinnct concerns the expense 

involved in these methods. Checking the data, item by item, can be time 

consuming end require@ the time and effort of professional personnel. 

Extlnple of Interprogram Consistency 

A method of checking item analysis which may be called en interprogram 

consistency method has the advantages of both speed and prtclafon 

for computer application. 

This method a8aumta that there are tvo computer programa producing 

results on a given set of data. One program mill produce a raw score 

mean md standard deviation, and the other the item onalyeia data. 

The comparison of the result6 of the two programs 16 baro,d on the 

\ relationship of the sum of the item p-vaiues and the raw score mean, and 

the � ura of the ites reliabil!..y indicts end the raw &core etandord deviation. 

Two further arrumptionr concerning theaa rtatistical characterfrtica 

are that r;v ecorea ere computed on the barir of the number of correct 

� n8vera and ths;t the item discrimination index io a point-biatriai corrtlatloc 

(Gulliksen, 1958). 

me computetion of the sum of p and the sum of the reliability 

indlcer involves a third cmputer progrem which may be called the itemtent 

covariance progrem. The computation0 involvs summing the item pvaluer 

(Cp) to product a mean , end the une of the formula 

K 

L/T-G= 

1 

to product a rtandard devfation both derived from the itan e~IelyOi8 

(Gul’Llkrtn, 1958). The formula dm defines the index (&llikren, 

i958). 

- 

. 

. . 

104 - 

._ .- . . ---. ._~ 

c 

, . 

. 

. 

.

. . 

.- . - 

., 

.- 

2 

:. 

-W-.l.W.‘.. --c*--L- I_,__,__,_ __, __L I .- ..-.,_.-. “.. ..-. _ ,*. - .--. - 

. 

A comparison of the two Set8 or’ statbtics will reveal differences 

due to rounding error. The amount of this rounding error 18 basically 

controlled by the number of decimal placea compb:atfonr are curried out 

to before the rounding process is introduced. The raw acore mean and 

standard deviation will contain rounding errors 8~ Viii each itcm’p-. 

value and point-biserial correlation. The production and surming of 

reliability Fndictr vi11 result in varying mounts of rounding error 

in the derived mean and ottndard deviation. Consequently, the difference 

batvttn tha derived mean and standard deviation and the mean and standard 

* deviation based on the raw 8COitl vi11 vary froai ttlt to tert. 

One method that may be used to interpret the tlgniflconce of these 

difference8 conaisto of developing an empirical probability table of 

differences bared on data of known reliability. In its simplest form, 

thlr table would be produced by deriving means and standard deviations 

from a rample of one to two hundred eets of item analysis data, then 

comparing them vith mt@nS and standard deviations developed fran their 

respective raw score di&tributiona. The mean and standard deviation 

of the distribution of abroluta differences would form a rough probabi1it.y 

table which may be used In evaluating re8ult8 from future u&e of the 

interprogram method. 

If canputations art carried cut to the fifth decimal places before 

rounding, the difference between there #et6 of mtdns and etandard &viationo 

due to rounding error vi11 be minimal. Consequently, any difficulty 

that may occur in any of the three programs vi11 yicid rtsulte eubstantially 

outride those indicated in the rxnplrcal probability table. 

Under these conditions, the Jnttrprogram mtt’md can ba a prtcirt 

method of checking machine produced item analysis. 

The actual vorking time required, which includtr the running of 

thb* item-test covariance progrm to arrive at the final rtoultr, 18 

approximately one hour par thirty-two ratem, or thirty-tvo 150 quertion 

teats. 

The primary value of this method is to provide a precise and 

relatively quick check of sttm of itan analycllr data that will � tparatt 

out the nicety-five percent that is completely accurate. 

In thoaa instancea where the differences between the derived and 

raw #core � tsns and rtandard deviations art too great to be accounted 

for by rounding error, the difficulty may be in the raw score. the 

item analyria or derived rteult8, or finally an error In checking the 

derived results againrt the raw score results. 

Reference 

Cullikrtn, Ii., Theory of mental ttatu. 

- - - 

. 

105 

New York: Wiley, 1958. 

I 

.. 

,

,b-.y 

.._ 

HOS Evaluation Test Validation moiedurca 

RAYMIND 0. WALDKOETTER 

: US Army Eniirted Evaluation Center 

‘.\_ 

‘-.._, 

\ 

I’ 

, - 

. 

, 

� � �� 

The following points will be covered with the intent to give e 

condensed familiarization of this Center’o HOS Evaluation Past Procedures. 

The technical aspects of how it is done may be uxxe readily 

assimilated by checking vith the evaluation section and going to 

selected referencee. 

1. The emphasis on test validation has been reinforced by the use 

of the special rating of job perfomance as a criterion. 

2. The criterion is an appropriate rating sample of the job performance 

a8 experienced by peers under the guidance of the rating device. 

3. Test validation is concerned vith determining: first, just how 

the total evaluation test correlates vith the criterion; second, what 

makes up the valid portion and segments, and their individual and total 

correlations; and third, just how the outline for test developent can 

be used as a guide with the recommended nlnaber of item to increase 

validity. 

4. The validating procedure cmputations are coupleted with the 

multiple correlation between EZT, CER, and the criterion, wfth an additional 

validity coefficient given by the correlation between the weighted 

scoring formula and criterion. 

5. Validation activities will accelerate to give a hoped for 

iutprersive continuity in the qualitative and quantitative test control 

procedures. 

The achievement of MOS test validation has always been a basic task 

for USAEEC, but it has received a new impetus this paet March (1964) when 

the decision was reached to tise a newly drafted special rating of job 

perrformc:e as a criterion measure. Since, due to physical limitations, 

it is not immediately possible to validate all HOS, the HOS consequently 

oelected for validation were identified so that a maximun sampling of 

the personnel evaluated would be obtained during the prescribed test 

periods. 

A short treatment of the criterion is in order h,:ra. and possfbly 

a good word for peer ratings. An appropriate sampling of eM designated 

in specified HOS are rated by at least 3 co-workers and 1 eupervleor. 

Readability coefficients are estimated for ratings of each sample using 

a one-way analysis of variance (Wirier, 1964). The rateee must have been 

known by the raters at least one month, observed several times a week, 

� 

. 

�� 

106 

_____. .._ _ . --__--I_-. ..- 

. 

, . 

� 

�� 

� � 

�� 

. 

��

. 

. 

and performed .the duties 01 the Primary NOS. The ‘mean co-vorker rating 

given by 3 En for each of the men in the particular validation sample 

serves as the criterion. Since a sumnary of buddy ratings in military 

research by Hollander (1954) emphasized the relevant values of peer 

ratings, this information vae instrumental in the USAEEC research 

decision to apply the job performance criterion. 

An element of anxiety may surround the peer rating mthod in aom 

quarterr. It has been found wantfng because of ineffestfve application 

in most instances. For example, young recruits could not generally be 

expected to rate leadership in the military setting because they do not 

have adequate knowledge of the role and no degree of experience in making 

such a judgment. The fault lies not fn the racing process but tn the 

Ineffective preparation for fts use. The comparison most apt may be 

that of trying to ahoot a bull’s eye vfth a defective weapon. A stand wss 

made in our program to develop a rating form which would minimize rating 

format influences and insufficient knowledges of co-vorkers about each 

other. The rem1 t being, that peer ratings of a relatively structured 

style, are postulated with a sense of confidence toward obtatnfng a 

realistic estimate of job performance on an eleven-point scale. 

The administration of rating wa8 prefaced by special instructions 

to induce the raters’ acceptance of the task in a mre Informed and 

responsible may. A research psychologist conducted the rating session, 

while the teat control officer (‘X0) at esch installation we requested 

to schedule all of the available racers who were qualified to rate @I 

in the specified HOS. Croups of about 20 to 40 men were assigned to meet 

in suitable place8 for the rb;ing sessions, which were usually completed 

in about 20 minutes. 

Three phases of analysis compose the substance of the test valfdation 

procedure: (1) analysis of the total evaluation test; (2) analysis 

of the valid portion of the evaluation test; and (3) providtng rccomnencled 

numbers of item by evaluation test outline. These phases art organized 

to assist in recuring the desired evaluation test speclflcations through 

te8 t rev16 ion. 

Initially, in the first phase the relationships between item 

statistics and test statistics are thoroughly delineated. Results of 

aoalyeia in tabular form show the total test, technical test and Broad 

Subject-Hatter Areas (B!SMA’a) by number of items with the rtapective 

means, standard deviations, RR-20 reliability coefficients, validity 

coefficienta, beta weights, the multipltR,and coirected R after shrinkage. 

men follow in another table the correlation coefficients betveen each of 

the BStU’a, the BSHA’s and the criterion, and total evaluation test and 

criteriorr. In u third suxnary table the rerults reported for items 

include item p-vaiues for the total HOS population, p-values computed 

107 

:. .

.\ 

. 

. 

. 

” 

for the validutioo e8mple, item standard deviations, item varibccee, 

item-test correlations, item-criterion correlations, indcxea of 

reliability, and indexes of validity. Mr. Wry and Hr. Shirkey h&ve 

presented in their reports, which treat the iten relationships, 8 

wider di6cour6c vith an8lyeis and implications in this ares using the 

needed examples for illustration. 

The second pha6e, concerning maximization of test validity by item 

alalection, ir appmached by tlimiruzttng item8 from the origin;ll evaiuation 

te6t which contributed nothing to validity. Correlation coefficient8 

appear in tabular form showing the relationehips between BSMA’s and each 

BSHA and the criterion vith the coefficient computed to indicate the 

validity of the revired evaluation teet. 

The third phase of the sequence Fn the telt validating procedure6 

occurs fn a m6ximlution of test validity wfth the optiml allocation6 

of items acccrding to the test outline. Techniques of multiple correlation 

uhfch determine optinval item ellocationa have been formulated 

by HOr8t (1949) and Taylor (195C). Theee technique8 reflect function6 

In relation to the correlations between B.SM’o, their reliabflitfcs, and 

validities. The correla t ions, reliabilities, and validitiec are systematically 

ctungcd. vhen the number of item per BSX4 are altered. By 

increseing the less reliable Bs;yA’s. validity is further enlarged, provided 

the valid vrriance is not measured alao by other BWA’r. After a clueter 

aMly6i6 (Br*Jchter. 1954) of KiNA’s the Horst technique is applied. 

(This procedure in applicable currently, but oay give way in deference to 

the Wherry - WIncr mathod for factoring large numbcr~ of items.) In 

tabular form a rwmary is given of the cluster analyeie showing each 

ciuster, optlpal nrrmber of item6 per cluster, and the BWA’s per cluster 

with the valid number of items In each BRIM, the proportlon of BStIh 

ltem6 in each cluster, and the optiraal number of iteuis per BSHA. 

From a ruemary reviev of the validating procedure8 used vfth the 

te6t analysis, a brief description of the overall validity approach fa 

desirable at this point. Since urch, some six MOS terts which eVAlUStU 

mboot 17% of EM under the Army EES, have received validity analysis and 

evaluation based upon the criterion of job performsme racingr. 

The ssmples used rnnge from 30 to 129 and were checked to assure 

reprerentative groupr. The evaluation test validity coefficients 

ranged from .lO to .52. The CER had validity coefficients rangihg from 

.20 t o .51. The multiple correlation coefficient6 betveen evaluation 

test, CER, and the criterion ranged from .36 to .S5sdemostrating a 

rlightly better prediction of validity from the BT and CER combined. 

The corrected multiple correlation coefficients after ehrinkqe range 

from .34 to .51. 

. . . . 

108 

- .I 

. 

.

’ 

. 

I 

, 

._ ..--“.,.-.“..~...;..-. . . . . _ _. . 

. 

bt ft be remarked here that although some validity coefficier.ts 

.have not reached entirely sanctioned level8 of validity wet are 

at.taining significant levels of validity. With the identification of 

weak points in validity we can utilize greeter control thereby beginning 

the necessary corrective test developtint procedures. For example, 

the validity coefficient for the HOS 941.1 Waluatfon Test (Cook) wa6 

.l.O, vhich should be-susceptible to improventent by simply eelecting 

‘the proven valid items and adjusting the reading level of the test to 

that more usually experienced by cook6. Mom a mmparisoa of validity 

coefficients I would like to hazard a conjecture that the validity 

coefficients tend to be higher for the more technical jobs of HOS 

t!mn for the unskilled or perhaps motor-skilled jobs. 

Another combined validity coefficient fa given by the correlatfon 

between the rav composite score or scoring fomula and the criterfon. 

These coefficients for the six MOS ranged from .22 to .49. 

To realize that validation procedures fn &ma of man-hours, data 

puocessing, and interpretation presupposes a high degree of technical 

organization, requires very little analytical skill. But to utilize 

these procedures and insert the mdfficatfons needed to improve validatfon 

procedure6 demands that an organ?zatfon reach a noticeable stage 

of maturity. The maluJtlon and Ar~lysis Branch of USAEEC, nov embarked 

upon thts stage of maturity, r:il continue to improve and increase ft6 

vrlidation efforts fn 6a63p?.it!43 all area6 of the HOS 6trUCtUre. 

109 

, C 

.! 

i

References 

Pruchter B. Cluster enelysir. Introduction to fnctcr knalyaic. 

New York: D. Van Nostrand C-an>, Inc.,TF54, 12-17. 

Hollander, E. P. Buddy ratings: Military reosarch and induatrial 

implications. Personnel Paychol, 1954, 7, 385-393. 

Horst, P. Determination of optimal teat length to reurizaita the 

multiple correlation. Pepchometri&, l>G9, 14, 79-88. 

Taylor, C. W. Maximizing predictive efffc.ency for a fixed total 

testing tima. Psychometrfka. 1950, 2, 391-406. 

Wherry, R. J., and Wirier, B. J. A methqd for factoring large numbers 

of iteuxu. Paychometrika, 1953, 18, 361-179. 

Wirier, B. J. Single-factor experiment8 having repeated measures on the 

saxa8 elements. Statistical principles in experimental deeign. 

New York: McGraw-Hill, 1962, 105-139. 

I 

. s 

. 

110

. 

. 

. 

- 

. . 

. . -.. . ^. ‘. L ,... . . . . . j . . 

Background 

Approaches to Improved Measurement: 

Research in Progress 

. . 

CASLHER S. WINIF~ICZ 

US .Nsval Examining Center 

The research projects abstracted herein are primarily projects 

th8t the US Naval Examining Center has concentrated their efforts on 

sfnce the last Military Testing Association meetfng, held in 1963 at 

Groton, Connecticut. ThcSe.prOjeCtS should not be confused with the 

regular semiannual evaluations that are conducted on all examinations 

and their respective populatlon8. To eumarize a few of these autometic 

evaluations; each examination is anslyzed fn terms of it8 adherence to 

the criterion standard of construction; professional and military sections 

are analyzed independently and also theft interactions a8 major components 

0” the total instrument; intercorrelations between all minor aubsections 

on the professional pgrt, as well as vith the total composite; 

item atUllySi8; raw and standard score conversions along with their 

respective statfetics and graphic representation of each population a8 

well a8 the parameters; analysis of the final multiple components; check 

for compromise and collwion; and finally all the various activity and 

bureau reports that are required to adequsltely summarize thfe information. 

A susxaary listing of the following research projects fndir-pte those 

unique areas that h8ve been investigated to provide-additional technical 

Information TV improve and support the Naval Advancement System. 

Pro-Pay Survey 

A survey was conducted on the prevalent attitude toward a variable 

reenl!.stment bonus (guaranteed minimum bonus with a variable sum based 

on rating criticality) vice a proficiency pay program. Ten thousand 

canfdates were sampled from a cross-section of 231 different Naval Activitfee. 

The major extrapolation from the data tend to support the premise 

that,in general,the group would prefer a variable bonus system in lieu of the 

present pro-pay program. This I.8 based very briefly on the fact that 

although only 25% of the total group are currently receiving pro-pay in 

one form or another, they only favor pro-pay In a 60 to 30 ratio. However, 

the larger portion of the sample, the remaining 75% that represents about 

6000 people in this sutiey are in favor of the bonus over pro-pay in a 2 

to 1 majority. 

Automatic Sxamination Requisitioning 

The examination answer card8 have Jeen revised to Incorporate the 

collection of two variable factors of performance evaluation and awards. 

Basic battery scores ‘Ire constant, and length of service and time in rate 

* 

111 

I 

..- . - -_. ..- 

; -! : 

i i

. 

, 

, 

. - 

c 

increase by a constant for each examination seriee. A propoaed tape 

configuration s-1ded to the present computer would allow for the elimination 

of certain administrative responsibil.ities prior to the administration 

of each examination (elimination of !UV,PZRS 524 which contains 

biographical information on each candidate). A raster tape file cotttinuoualy 

updat,ed by NEC in conjunction with the collection of certain 

infonnntion directly from the answer card would eventually eliminate 

the necessity for ordering examinaticns and they could be sent out 

automatically by name vhen each candidate becarocs eligible, as indCcatRd 

by the file maintained by NEG. 

Actuarial Longevity Prediction Study 

Since the literature only covers conventional statistical approaches 

to prediction, such as regression analysis or multiple regression, and in 

some cases the application of the Poisaon distribution or the negative 

bi-nomial, raw data was acquired to experimentally determine the fcaafbility 

of utilizing the actuarial approach in predicting longevity. F31vfdence 

to data indicates that this approach will eventually be ctilized in 

certain areaa of prediction as a routine technique, To fully appreciate 

the success attained by this approach, addifional comparison studies 

between various predictive techniques will be conducted aa to the eventual 

superiority of one in terms of minimizing error, fewer assumptions, time 

clement, and the final extrapolations from the data that are poaeible. 

Class Scheduling Project 

A atudent, teacher, and class scheduling project was cmglcted by 

hand baaed on raw data (N=2000) for a single school, The application 

of resulta will eventually be duplicated by a computer program and the 

two aete of results will be analyzed for comparibility. The project 

will then be extended for application to the CNARESTRA problem (N-35,000) 

which will Involve a series of schools located geographically in various 

parts of the country and eventually encompass the mobility of atudenta 

from one area to another. Complete coordination will be poacible between 

the transportation, teacher, student, school, classroom, time sequence, 

and course information factors, all will be in one co;aplete computer 

program. 

Differential Weights of Final Multiple 

The final multiple acore is composed of five individual factor 

scores which are a-d on the basis of a simple weigLting formula. 

Each factor has a designated ideal contribution to the overall variance, 

but the actual or real factor contribution IZWIY and does obviously vary 

from this standard. A study was implemented to determine the empirical 

interaction of the five factors making up rhe total composite of the 

multiple variance of passing candidates in selected critfcal ratea frclm 

previous examination series. The results indicated that the factor 

112 

.

. 

. 

. 

contributions to the fir&l mulctple variance differ from the apparant 

atandard by rate and factor. @n the basis of pay grade medfans, differences 

from the standard were etatistfcally significaut for all factor8 

at pay gradeo E-4 and E-5 and not significant at pay grade E-6. 

Pre-Testing for Controlled ~minations 

The Navy’8 regular advancement examination is compoeed of 150 

4 option, multiple choice questiors. In the construction of theae 

instruments only 50% of the questions have item statistics and are 

utilized to control the standard of the examination agalnat the deoired 

criterion, The remainfw percentage of question8 are new and generally 

cover the entire spectrum of difficulty. Closer adherrnce.to overall 

a tandards can be acquired by simple pre-testing of the new questions 

on selected populations. This in eesence will provide for a completely 

controlled examination from the standpoint of item difficulty and discrimination. 

Reliability Computer Check Study 

i 

7. 

A detailed approach to one aspect of reliability checking for 

computer operation8 ie given in another part of this surmary. The 

paper wa8 presentad independently a8 prt of the Theoretical Seminar. 

Four Cycle Bcaminin~ Periods 

The regular advancement examfnattona are pr?marily discriminating 

and administered Navy-Wide each February and August. The regular input 

at pay grade E-4 cannot be maintained in 22 critical rates, vithout 

disturbing the balance of the total system. The possibility of examining 

these critical perrronnel duriw Hay and November with a qualifying type 

oli examination ha8 been investigated and proposed a8 a poseible solution. 

Simplified Tri-Serial Correlation 

A byproduct of a latger project yielded a simplified triserial 

correlation, result8 of which were presented at the national 1964 APA 

conference. J88pCn’8 original formula for the triserial r ie rather 

unwieldy and therefore has been passed by for 8Ome easier lees appropriate 

correlstion technique. Jaspen’s formula equals: 

‘tri p 

ZaYa + (Zb - za) Y b - ZbYc 

Za* + (Zb - Za)* + 

=r 8 b 

C 

113 

. 

.- 

.

,. . . ; 

./:. 

. . 

--.. ----. 

,. 

,.‘. 

5/’ 

,./’ . 

\‘ 

.\.,‘. 

2. 

- 

. _. ., _ . 

The formula derived which gives identical resuAts with Jaspen’e 

formula equals: 

I‘trf u 

CYa + (Na - Nu) (Yt) - L”y-2 

Ntuy (2, + 21) 

Weighting E Two Subteet composite 

Another byproduct and also presented at the APA, was the tteighting 

of individual subtests in I composite, where they become more critical QBL 

the total number of tests in the composite dscreaaes. The importance of 

controlling the weighting of individual eubtesta reaches a naxfmxu When 

there are only two subteeto in the composite. A method of controlling the 

contribution of each subtest in a two teat composite ia given by the 

following formula: 

Wa + rab 

Za 

’ + ‘ebWa p 

f- 

1 - za 

Open End Item Distractor Development 

Pay grade E-8 and g-9 have essentially an examination tbAt is part 

aptitude and part achievement. Some of the subteste contain leadership’ 

and situational problem questions’which are extremely dfffic*tlt to 

develop. A technique of using an open end question to collect responses 

was utilized. ‘The various responses are then tabulated and arerbad into 

aimllar groupi. ge by a frequency count. The three most attractive incorrect 

rsplies are then merged with the correct anewer to form an ites along 

with the appropriate stem. 

Longitudinal QualFty Control 

A quality control study is routinely conducted after each advancement 

cycle in order to determine adherence to etandarda. It has the 

advantage of presenting current Fnformstion and aces as a warning system 

relative to overall examination quality. A lolyitudinal seudy based on 

previous quality control studies is now in progress and its effect will 

be cumulative in nature and will form the baeia to predict the desired 

atatfstfcal characteristics of future examinations, 

Effect of Automatic Advancement 

Various programs at the minimum petty officer level (pay grade E-4) 

were introduced to act as a stimulus for reenlistment in the &IVY. As 

a rettwlt of euch programs as STAR, SCORE, and CLASS A school, atitoPratic 

. . 

. 

. 

.

._ 

_ --r 

-. 

:’ ,--_ 

.I’- 

..__ 

..- _ 

_. 

.I. : 

‘\ ‘,. 

, 

. 

. 

-- 

. 

-. .I 

. . .~ . . 

promotion to pay grade E-4 is possible without the necessity of taking 

an examination. Since the inception of these programs which are a 

recent innwat ion, a certain normal quantity of the examination population 

is missing. This mean8 inessence taking the quality off the top and 

having the remainder take examfnations. Tine standard at E-4 will obviously 

change and also when this scgt?ent of the population ia added back into the 

competitive examination popclation at the next higher level at E-5, certain 

ramifications will take place. This problem is currently being fnvestigated. 

Forced Sampling Technique 

The forced sampling technfque is a form of stratified sampling. 

It controls a sample by reducing biases caused by chance factors in 

random sampling. The procedure is to rank the total number of candidates 

from lw raw score to high raw score. The next procedure is to 

pull the sample in such a way that all raw scores are represented as 

equally as possible in the sample. The formula used to choose the 

individual samples was: 4Nt/N, + Nt/N, + Nt/Ns...Nt equals the total 

population, and Ns equals the sample size vhich is desired. The forced 

sampling procedure has been found to be extremely accurate in representing 

the population median mean, and standard deviation even with 

sample sites of less than 50. 

Population Analysis Instead of Random Sampling 

The sample sizes of the occupational rates are based on the limits 

of a confidence interval of a population. The confidence in the limits 

of a mean for a given parameter is the fiduciary probability. The 

fiduciary probability is better than .95 that the true mean 1i:e in the 

interval n + 2.00 S. E.,, and .OS that it falls outside the 1im:ts. A 

standard error of one raw score was the average standard error of our 

samples. The effects on item analysis ard reliability of the examination 

is noticeable, although by increasing the number of candidates used in 

the studies tne vatLance also increases slfghtly. However, the use of 

the total population in the studies provides more stability in all item 

statistics. 

Validity 

Concentrated effort has been placed on obtaining indirect measures 

of validity through analysis of class A school graduates versus non 

school graduates, frequency breakdowns of various elements of the population 

and conducting teats of significance on the differences, Conventional 

validity through supervisory ratings and peer ratings has 

yielded validity coefficients that average around .35. 

115 

. 

--- ‘- 

.

Approaches to Improved Measurement: 

Research in Progress 

RCMLD K, GOODNIGHT ( 

US Army Enlfated Evaluation Center 

Throughout the confer&e we have heard and discussed numerous topic6 

epanning the realm of test immrovement and development. In every cane 

each method used, the reaulto obtained, and the utilization of these 

re6ults are all bared on reresrch. Reeearch is the guideline to BUECC~U 

in almoet any endeavor. Tnncrefore, I am plcaeed to present to you several 

of the more important reeearch projects which have been completed, as wel.1 

68 nome currently in progrerit, at the US Army Enlirted Evaluation Center. 

The first project, completed oeveral years ago, ir A Comparison of 

Seven Methods of Computing Total Test Reliability fromma Single Terr 

Adminietration. Ttdt reliability (that is--a determination of teat conriatency) 

is evaluated by Lnvtstigating tht relationship of the praiictor 

to itrtlf, or in other words, it is-the relationship of the ranking of 

scores on one administration with the ranking on a aubsequant adminlstratfon. 

The rsliabilfty of a test can generally be estimated adequately 

fran only one administration of a test. 

For tach MOS Evaluation Teat a reliability coefficient is computed 

via the Kuder-ttfchardoori Formula 20. However, it ~68 imperative tc know 

if thie wa6 the most accuratt indication of the consirtancy with which 

the tert was measuring job proficiency. Thtrefore, this project was 

designed to determine which of the odd-even, K-R 20 on the total group, 

K-R 20 on 54% of the group, Hoyt’s Ana1ysi.e of VerCanct. Horat Maximum. 

Horet Corrected, and Cleman’r Maximum reliabili*.v mearurement methods wan 

super for. The result8 showed the K-R 20 relfability sethod on the total 

group io the meet appropriate for the MOS Evaluation Te6ta, thus supporting 

It8 uragt by thL Enlisted Evaluation Center (EEC). 

Thi8 study was completed in February 1963, and prcaently another project, 

A Canpariron of Fivt Hethods of Computing Total Ttrt Reliability frus 

a Single Test Administration, ir under way. This 18 a replication of the 

earlier study with some minor variation6 tc further verify EECI currant 

procedure8 in tt6t reliability estimation. Also, another reason for con= 

ducting this study lies in the proposed technical rtcommendationr which weft 

presented at the 1964 American Psychological Association meeting&; their 

rscamntndationr indicate that the K-R 21 ia more appropriate to use then the 

K-R 20 under situations such as those at EEC. Therefore, in this rtudy five 

methods of reliability eatimation-- K-R 20, K-R 21, odd-even, Hoyt’8 Analysis 

of Variance, and Uorst’o Maximum--are beir.g compared to determine which is 

the better method of mearuring te6t cons,etency in view of APA’s rtccmmendations 

and to rtudy the effect8 of aample sire and MOS rkill level on the 

varioue reliability coefficlentr. Results are not available thus far. 

116 

-_ ._.....- .--- .---. --. 

c 

_I__...

,’ 

‘. 

,-- - 

: ’ 

‘.b 

:’ 

_. _ 

: 

: 

I , 

_. 

. . 

f 

-.. \ 

‘\ 

-. 

‘. 

Another area of reoearch is the longitudinal studies on the cmmander’s 

Evaluation Report (CER). The CER is the officfal rating form 

which constitutes one of the cmpone’nte of the Enlfated t’valuation 

System. The primary purpose of the CER is to provide an sseessraent of 

the soldier’s job performance and potentinl-for advancement by his imdiate 

and secondary supervisors. Although ratings, per se, ‘are not perfect they 

are the best available evaluation method when no objective masure can be 

obtained; therefore, the enlisted Evaluation Center is quite interested 

in the perfomance and improvement of the CER. 

no studies presently beinS conducted to facilitate the continual 

improvement of the CER are A Factor Analysis of CER Scales, Technical 

Subtest. and Suoervieorv Subtest. and 

I A CccJlparison of a Graphic Raa 

Method, A Normative Paired-Compartson Rating Method, and An IpeatFve 

Paired-Comparison Ratfng Method via the Multitraiz-Multimethod Hatrix. 

The factor analytic study was to determine the degree of c-n-factor 

variance of each of the twelve CER Pcalee and the technical and supervisory 

subteata ofanMOS Evaluat:on Test. prom the results of this 

research will come information necessary for proper revision of the CER 

to eliminate rhe measurement overlap or c-n-factor variance, thus 

improving the evaluative ability of the zating. Seven orthogonal factors 

emerged from the analysis, however, tbey a:1 have not been named. One of 

more important results thus far, however, was the high loadfng of the 

technical subtest on the initiative Factor. 

The second rating study, presently in the planning stages, ie A 

Comparison of a Graphic Rating Method, A Normative Paired-Cozparieon- 

Rating Method, and an Ipaative Paired-Comparison Rating Method via the 

Multitrsit-Hultim+thod rWtrix. This research endeavor was dec Lgned to 

statistically letemine whether the graphic rating method as used in CER 

ratings is adequately serving its purpose by providing valid and accurate 

information to the Enlisted Evaluation Center, or, if either the normative 

paired-canparison 4th certainty judgments method or the ipsative pairedcomparison 

with certainty judgments method would be superior and provide 

more accurate and valid results. Also, since the normative and ipaative 

data are aupplemntary to each other, it is possible to statistically 

combine the data from these two methods, thus yielding a fourth rating 

method for the analysis. ft is felt that this information will be 

valuable in assessing the adequacy of the CER ratfng method in comparison 

to the other rating methods. No data have been collected yet, 

One very important previuus study was The Effect of Rater-Ratee 

Acquaintance Period on CER Ratings. This research project was designed 

to determine ii a specified minimum period of acquaintance between the 

rater and ratee was neceesary for satisfactory and reliable CER ratings. 

The results obtained on various NOS skill levels indicated that a minimum 

period of two months (60 days) acquaintance was esaentfal for proper 

evaluation. This time period is now mandatory in the Department of the 

Amy for all CER ratings, although some leniency in this requirement is 

alloued. 

117 

------- 

. 

‘I 

;;

.\- . 

‘. 

,-. 

/ 

I 

\ , 

_. /’ 

, 

/ 

/’ 

. . 

,_ -y...--- 

. . -- 

I 

. 

. 

. 

FoeoiL\‘ly the most important area of research at the Ealfated t-kaluation 

Center are, the validation ctudiee, of the Cczzindtr’i: ZvalusrLon 

,Repzt and the KG Evaluation Testa which ere conducted routinely. 

Formal val.idiry rcpmtlng is quite comprehanui:te and tncltrdes the 

validatioil of both the CER aitd the EvaloAt.fon Teat as *:ell as the interrelationship& 

between these instrrantc. Dee t o thrj lergthy processin,% 

t&e required for the forwl validity report, a prclimin~ry reporting 

procedure io employed to provide maxiliium data for te3t revicion. 

A preltmtnary validity report is to prwtda guidcltaea in test 

revision based on pertinent statistical deta. TF;fo vaiiGi.ty data 

cover&z’ the entire evalwtion test outlin 1) and individual test item 

are provided prior to test revision to allw for their UB~ in the 

decision-making process of test development. Emphasis is now placed 

on the WIfi and HOW of test vslidity. 

Statistical bases are provided to allow control in :c?~~t revhim 

over the evaluaticq test characteristics of wan, ste.ndard deviati.on, 

reliability, and v.llidity. The test characteristic of valfdity is 

given specfal atta.;tLon, and is considered at both the item and test 

outline levels. The statistical rationale and data necessary for thinpurpose 

are provided to enable their use In test develcpment procedures. 

Tha users of the information SGI data provided in the validation 

report should: (1) have a thorough working knout,cdye of the interrelttionshfpa 

between item and te6t stat!.st:cs to enable zzre control 

in test revision Over evaluation test means, atar.darJ devls.tkona, 

rcliabflfties, and validitlrs; (2) insofar a8 is ~racr.fcable, include 

items of substantial vaiic:ity in revised tests; (3) study itema of etibstanti.al 

valfdfty to deten?fne the particuler types of items which tend 

to be most valid far a given Rvaluation test; (4) ff ouClina revision 

is deeut?d appropriate, make such revisions in view of known intarrelatio.%hips 

between Eroad Subject-patter Areas; and (5) reconcile item 

requests vith both optin item allocations by Broad Subject-t!nttor 

Areas and considerations of n practical nature. In this way, mxtnwn 

control of the test results can be attefned. 

A Corcparative Study of a Short Form and a Lorq Porn of Performance 

- - 

Dictation Test for Legal Clerk or COure Reporter - - te 8 research project 

presently being conducted at the Enlisted Evaluation Cuntar. 

his project was designed to detennfne whether a short form of the 

Dfctation Perfornnnce Teat may be relLable enough to use in place of the 

longer form, thus savfng administration and scoring time 88 vell as 

providing a more maruzgeable measurement of the selected examineen. fie 

short form is composed of preeelected sections of the dictation test 

crmaprising 4GX of the total test. In a preliminary aralyeis, a Peareon 

r correlation cofficiene of .95 was obtained bemeen the long form and 

the exparitintal short revfsion. It would appear further examination 

of this relationship wfll prove fruitful, and the recommendation for 

the shorter test can be anticipated. 

118 

-- .- , -. . ..____. _____-... - __ .- _..-- ___---- ___^____ l_--

.1 , .I . . ., .;., ., _- 

. 

-.< “. ,:. ,L ,._.,._. _ 

Another group of continual research studies lies in the Inv’eetf- 

Bation of Possible Test Ccmpromiee. One study i.n this area vns the 

davelopwnt of investigation procedures. 

An MOS !Zvslustion Test would be comprolaiaed if one or more copltes 

, of the test booklet came into.tho possession of an unauthorized enlisted 

_ nun, and the Information vats used by him before or during the admints- 

/ tration of the test, An HOS Evaluation Test is subject to possible 

compromise if it is lost or is unaccounted for prior to the ccmpletion 

of an “02 evaluation p%riod. . . 

: . . 

. . 

., _,‘. 

. 

. 

In this study three methods for investigating possible compromise 

in a large military program in which tests are administered once a year 

were developed and experimentally tested. Compromise may be checked by 

(1) comparison of test scores of individuals or possible compromise groups 

with population parameters; (2) standardization and analysis of test 

scores wer two test periods; and (3) regression analysis of test scores 

over two test periods. prom the resulting statistical analyses the 

limitatfors and advantages of each method vere shovn, as well a8 a 

rationale for the interpretation of results and the formulation of 

subsequent administrative decisions and recolmlendations, Presently, 

any one of these metho,ds may be used depending on the circumstances. 

Two more recently carsrenced studies are A Comparison of Six Me,hoda 

of Item-Test Correlations and k Comparison E v a of%ur l u a t i o n Proceduref! 

for Pieasuring ParforuGG Efficiency. The first fesearch project was 

based on Cuilford’s (1950) study in which he compared the biaerfal r, the 

point-bieer?al r, the ordinary tetrachoric r, the Flanagan tetrachoric r, 

and two applications of the Phi Coefficient methc-;s of itein-tese correlation. 

~%a point-biseripl r proved to be the superior method. Therefore, 

thfe project was designed to replicate and extend Gutlford’e study 

since the point-bisarial correlation is sued in item analysis at EbC. 

This project will either substuntiate Guilford’s findings and give 

further proof of the value of the point-bisarial method, or it will 

reviee his results and indicate possibly another more accurate index of 

Item-teat correlation. ho data have been collected as yet. 

The second study noted above can be classified as a criterion study. 

Sufficient procedures are used by EEC for increasing and maintaining the 

reliability and content validity of the HOS Evaluation Tests. However, 

a number of problems are encountered in the establishment of concurrent 

or predfctive val:ldfty since an adequate criterion is necessary. The EEC 

has developed a rating scale which is presently being used for getting 

criterion data via co-worker or peer ratings. However, more satfsfactory 

criterion data msy be accessible by using other measurement procedures. 

The purpose of this project is then to compare four methods of 

measuring perfor=mancaefficiency to learn whether a superior rating 

119 

,

.L 

.: -_ _. 

.\ 

.‘? .’ 

’ I -. , 

I .: 

. . . i+’ 

,*! 

i _: 

procedure doeo, in fact, exist. This information wlll be valuable in 

facilitating in the quest for the most satisfnctcry criterion aaasureuent 

procedure to be used in valldatloa studies. 

The four rating procedures to be used a.6 criterion masure of 

performsnce efficiency are self ratings, peer or co-worker ratings, 

supervisory ratings (first and second level), and group-speciallet 

ratings. Theoe avaluatto~ will al1 be obtatned on fder.tlcal rating 

forms comprised of an appraisal of the subordimte’a “(Xterell perform 

ance” and a check list of his overall job-proficiency quallflcatfon_s. 

-_ 

These data when collected, vi11 be analyzed in vhole as vell as by 

subparts to derive the most meaningful information from them. 

These have been just some of the more pertinent research projects 

conducted at EEC. Uany other research projects have been done in the 

p-t, and many more are now in the developnrent ataga. It is believed 

that only through dellgent und carefully designed research programs 

such as these will improvement in our evaluation syntem be realized. 

120 

,

. 

. 

Test Construction Procedures 

wILLI\c’ .,.. ‘: tP*?rE, ..a CIIA,.NN , 

, US &my Enl;fir?d Evaluation Center 

The te5t construction procedure is an especially crucial one in 

,.obtainino, a valid instrument, especially for test6 such as o*.ir~ t’nat 

usuaily cannot’ be given experimentally prior to operational use. 

For evaluation activities to be most effective, they should consjs: 

of the best possible techniques, used In accordance with what we know to 

be the besr and most effective psychological principles. 

One feature that distin&uishes reputable *hotk in test development 

from that of the mas.; of self-styled “test constructots” or “test experts” 

a>d outright quacks is that the reputable worker in the field is continuously 

concerned with testing, verifying, and improving: the adeauoc) 

of his procedures, He, knows that he does not know all t!le answers, and 

he is cvcr on the alert to find out more and to improve his procedures. 

There is no easy road to scientific test construction.’ The’roat’ is loni; 

and tortuous and beset with many pitfalls. 

Xn our types of testins programs, there usually is no test available 

that corresponds satisfactorily tp a function h+iich seems important to 

test. We as test psycholoI:-.sts and subject-matter experts, and coordinators, 

are then truly put upon our own mettie to originate improved patterns of 

test performance and to develo? a crude test idea into a practical and 

reliable testing instrument. This constitutes the most exacting and, at 

the same tfme, the most interesting and rewardfng phase of tesK development 

work. It requires truly creative efforts. 

The topics to be presented by the four symposium members are some&at 

diverse in nature and should stimulate our thinking in the direction of 

improved test construction procedures. At this time, I would like to fntroducc 

the fou; symposium members: 

Mr. Isadore J. Newman, 6570th Personnel Research Laboratory, 

US Air Force 

Mr. John Crediford, US Naval Examining Center 

Mr. Charles E. Cassidy, US Army Enlisted Evaluation Center 

Mr. Fred B. Honn, US Army Enlisted Evnluation Ccntcr 

Since the four ?apers to be presented by these gentlemen arc somewhat 

diverse and to keep our thinking “warm” relative to the particular topic, 

we will have a short question-and-answer period following etch presentation. 

121 I

Educators nnd trr;incrs erc cor,tinuelly kin:: chnllcn$ad to project 

knov1cdge.s that en&le ir12ividunls fo not only txijas: co ti;cir cnu 

viroment b.lt 9llstcr i t . ‘Jnfortunately ,tb.cre ar.2 notable dtf fcrencee 

among some educators and trttinera in their rspncfty to perceive that a 

challenge exists. xnfonration is so rapidly accumulating t’nnt it la 

imperative that we seek new methode of dissemination, Since 60 much of 

man’s diverse behavior is a result of lcsrnfr?g, we must find CI system 

by which vcrlffcd facts and relationships may be projected to hisi in e 

systematized effort, 

Educators end treiners must seiect the cethod of bnatruttfon vh

. 

. 

* 

.-a . ..” .-. ,. ., “.. _... .,... “. _..._ “__e._.- *...+... . . .- ..^ _.a, .~f-*-i-Y. “. ,C^.. 1 I .h. 

sarfaa of taaka which their mission requires. The conxzands develop their 

owu measuring instrumenta to evaluate their o*m on-the-job training 

progra36, Our unlt has the r;Iselon of dcvelcping the meeauring instrument 

for USAF; b%ich, under the new conccptr will .cNJaoure the knowledge reeulting 

from studying the subject matter of the ca:ccr development courses. 

Where&a in the put we verc chnrgcd with the reeponeibility of r,;aauring 

cn aiman’s knowledge of his entire GP+a-ialty, we vi11 bc limited to 

measuring only that knovlcdgc tiich is boee.‘. on the mntcrial concerned 

with principlea and fundamentals found in the career development ccursc. 

The Air Force Training Cocnond people who monitored this ncv concept for 

the Air Force, have ststcd “Hodificatione of present methods will be used 

for testing knowledgea learned and tikills developed. Specialty Knouledge 

Testa (SkTe) will becorce in effect ‘end of course’ tests for airmen 

completing a career develo;tment course. The test will cover only the 

content of the self-study natcriale. The writers of course materials vi11 

assume a nev role ~+ich will require more corefully planned approachca to 

the dovelopncnt of effective coureee, Specialty Knowledge Teats will 

adhere to what the student h-s been presented in hie coursa. If he 

hoe learned the material well, he should be surceesful in passing the 

SAT.” With this background I CD projecting the theei that a nyatsms 

approach might be utilized in the hole training technology into which 

our test construction process might ‘be integrated as one of the crub. 

systems, 

T%e Air TraloLng Coonnend has described the development of a 6yStcmR 

npproach as one that ‘I.... views the mnny in~;ividuals and groups de*dcloplng 

a particular veapons syetez a8 individual coD+W?nt6, like cogs in 8 

machine working togethar to achieve a ccmnon goh~.’ This approach requires: 

(1) the definition in precise terms of each person’s job; (2) a task 

analysis; (3) a specificat:ion of performance requirements and tolcrfince 

limits; and (4) a statercnt of the necessary interactions and cmnications 

to be carried out be:rleen groups --each requirencnt established to meet 

the predetermined system Eoal.” (Oflesh, 1964) 

In this frameuork,the Air Training Command (a subsystem itself, in 

overall training and evaluation) require8 a systems approach to include 

a task analyst8 as the second step in the sequence. To be useful for the 

test construction process these tasks must be stated in measurable bahavioral 

terms. A criterion test rmst then be developed co that a 

starting point in the training may be dcrermincd. The training materials 

are then started at thia point of departure and are carried forvard to 

the desired skill level. Our unit then takes over as one subsystem to 

construct a measuring instrument which purports to measure the knowledge 

which the examf.noe has acqufred at a specific skill level. This subsystem 

ranks the students according to how much knowledge relative to other 

students he has acquired, the resultant being a percentile score. 

123 

, 

.

Th? systems concept to be mortt effective, demands a alngIC mancger 1 

to align the subsysths so that each nerrhoe vlth the other aa lt evolves 

into tha einglo eyaten. Affter KQ US&’ has ccrtlficd that the contents 

of a opeclfic Specialty Dcocrlptlon correlate significantly with the 

relevant tasks involved In the fob, Air Training Commd break@ the job 

down lnto tasks and level6 of proficiency vhlch rare ueed as u Job Trafning 

Standard, If these tvo documents, villch wa coneldor 8uboystems, are 

atgnlflcont, then the subsystezs charged vlth vrltlng the training materials 

la off to n good start in making its contrlbutlon to the total reyatm. 

Assuming that each eubsystes has been properly constructed, and la properly 

coordinated with the other subsyeteae, vork ln the SKI tooting subsystem 

ohould proceed amoothly since ve would then be charged only with conatructlng 

an instrument that memore: how ~011 a:n airman haa mastered the 

metcrlals found in a career development courao. With the syetcm working 

optfmall)i wltheach subsystem making Lts proper contribution there need 

be no queetlon or concern over &at the SKT ia meaouring. Under euch 

circwatnnccs it will be meaaurlng the objectlvce and crfterfa eetabllahcd 

by the subsystem charged ofth prcparlng the career development course. 

At this point it must be remembered, however. that the SKT ia only sampling 

rcpresantativc areas of a teak as datcxmlncd by the teat conatructlon 

rubaystain. 

ihccaria (1563) hna eaid in dlacuoelng the problem of maeursman~. 

“Too often students are meoeurcd to fraction8 of a percentage pctnt agalnet 

other students vlthout ever being raeesurad agelnst mlnirxnn job requirements. 

There are two t :ln reusona for this phenomenon. Flrat, tralntng oblectlvco 

ara seldom atated in definitive enough terms, and aacond, a relative 

rather than an abeolute measurement oyetem i8 employed.” 

Thlo leads u8 to a diacuseicn of the usea for which an evaluation 

ir made. It la important to the whole system that this use be aperiflcally 

announced so that it will be one of the objectlvee for each subsystem to 

keep in mlnd title making their contrlbutfon, If Dr. Zaccaria’s crlterlonbaeed 

evalua:lon isuacd,it will result in a certain group being found 

proficient without knowing hov proficient. If the percentile rank lo 

used, ltwlll tell us the relative standing of each lndlvlduel in the 

group. It 1s up to the eyrtem manager to decide vhlch evaluation 

correlates best vlth the use for bhlch the measuring instrument is dosigned. 

A system8 approach vi11 work oniy when each subsystem la working 

towurd the rame goal under the direction of a slngla manager. 

124 

--... . ._-_. ^. --- - - .----.-- - -- . _ _ __- -._-- _..,.-. 

. 

. .

Ceoly, W. D., 6 Cratn, J. L. Dual concept of on-the-job train?ng. -USAF- Instructora’ Journal, 1, (Hr. 1). July 1963, 13. 

Draorel, P. L. Evalwtlon procedures for gantral education objectives, 

@xcatfonal record, April 1950, 97-122. 

Rflgard, t?, R. Theorfas s learning. New York: Applaton-Century-Crofta, 

1956, ix. 

Judy, C. J. Achieveneat tenting in the Air Force. USAF Inrtructora’ 

JOWla',, I, (Nr. l), July 1963, 17. ’ - 

Ltndgulat, E. P. Educational meaeur&ent, WaahFngton, d, C.: Mericm 

.----- - 

Couoell on Education, 1951. 

Mayor, Sylvia R. Raeemch on QuStRnrtad training at Eloctrooic6 Syotea 

Divirion. -111- Trends in pro-rcmmed - 

AosocFation, IYbb, 149. 

- inrtruction, National Educrtton 

Ofbooh, C. Air Tratnfng Comand’r System8 Dtvalopental Approach to 

Inatructioml Matertolo. 

Vitolr, 8. H. 6 Newman, L. J. A Cmpariron of Two Inotructlonrl Mthodo. 

Paper presented at Our Lady of tho Lake Collage, San Antonio, 

Tezaa, 1964. 

Zaccaria, #. A, Reappraisal of achievesent medeure#. USAF Inotructoro’ 

Journal, r (Hr. 1). July 1963, 73. 

125 

,

Sucimary of 

Pragmatic Creativity in Excanlnntion CdnbtructLon 

A Paper Delivered by 

John Credifotd 

us lava1 E?mafaing cancer 

Hr. Ctedfford spoke on the pragmatic utilftatfon of purely divergent 

thinking of nonprofeseionnl item writers. Because of a change in the 

time schedule, his prepared material was suzxnarlzed ~6 follows: After 

dcacribing the f-CA aa one of the greatest potential force8 in today’s 

teatinS, a ploa was made for the deliberate 1ntroduct:on of opportunltiea 

for free-flowing thinking into our tottinS ettuotfon. Several teeto were 

.‘aacrlbed that wete the results of placing nonmilitary item vritera in 

tne position of creating their ovn teets in a purely permissive environment, 

Learning by doing, uohuqered by tho rcntrfctlone of 

claeeical precedence, they produced sevaral noteworthy teeta, including 

a music test on tape administered tk* :r.zmar school chilfiren, a nilitery 

eupervfeory test, a queatlonnalrc used in oeiactlng brig guardo, and a 

tast for the selection of nonbissed. s*Jp+rvteory personno ir. indust,y. 

In line with the top:c of the dfacuanfon, the grectcet bendfft from the 

pepar was in the very fluent discussion Mich followed. 

126

:. 

‘ci 

� � 

� 

� 

� 

i 

I 

1 

I 

. j 

. 

. 

. 

. 

EvaluatFoll o f zotor s:cs11s 

A hOmc;y Little anecdote help6 to clarify my objectives in this pc-per. 

My father was born in Lrelantl, ad I recall a story he told me n~ny years 

ago about n fellow “greenhorn” whose working hours required him to retu:n 

home in the darkness of late evening. his cour6e led h1.m through a p a r - 

ticularly dark and deserted district in uhlch anyone with a lively Irj.021 

imaginntion could envision the direst of misfortune6 taking plscc. Every 

night as he passed through thie area, he constantly repeated aloud, 

“Praise the Lord and the Civil ain’t tl bnd man either.” In like vein I 

would like to pay due rcspcce to -the written test for its past contributions 

to the field of achievement testing and if.8 prospects of even greater 

utility in the future, but at the 6ame time, to recognize that the t.eRtfng 

of mOtor skills does offer Q great potential to be explored in our 

constant effort6 t0 create incree?ingly more effective measurement 

instruments. 

The rvaluatfon of motor skills will be treated in R q*~itc brosd 

COntCXt. The covernge includes all activities pertelnfrq co or involving 

muscular movement, not to exclude those requiring previous, concurrent, 

and subsequent cogniti*ve procesecrr. 

In effect, performance testing in 

ita mo6t conprchcncive appllcntion ~111 be considered. All test eituatfono 

in b-hich the examine& is rcqulred to do something Other than take a 

paper-and-pencil test: will be included. The use of driving tears as a 

prcrequisftr forobtaining a driving license in many titetea ir B good in- 

dication of the w!de public acceptance of performance test6. Although one 

might question the validity of these test6 BS typically administered, 

there con be no doubt that n driving test ie a pnrticularly good exqle 

oE a test Uf motor 6kille. 

The purpose cf subject-netter or achievement tcstfng normslly fr to 

provide an evaluation of the level of fob nascery attained by the examtnee 

in a given Job, and frequently tc rank o group cf individUala in regard 

to their relattve succe68. Depending upon the nature of the activity to 

be meesurcd, and frequently certain extraneoufi restrictiona, one test or 

R battery of two or more typca of tents may be ueed. In the selection of 

measuring instruments we place cnphaeis upon the p:esence of three b8sf.c 

qunlitiee: (1) vnlidfty, cr measuring what w want to measure, (2) reliability,or 

consistency of tTICiiBIJrf?ment, and (3) objectivley, or the erc- 

elusion of personal feeling not based upon accurate observation. We have 

at OUr disposal a variety of measuring inetrumenta* 

Llritten tent - The multiple-choice ~ypa of written test hoe become I 

almost: univerxy accepted ns the basic meonure. It poesecses the i 

127

, * 

-. aa* 

Y 

. -.. 

. . ’ 

advantages of very complete coverege of the appropriate subSect matter 

on the verbnl level in a comparatively short time, simplicity and euee 

of administration, relatively IOU COBt, SimplC scoring proccdusee, und 

finally, it fs suttable for prqsenting problems that m?esure many cypvpco 

of abilities. 

Performance rating - This consista of an evaluation of the individual’s 

performance on the job by his supervisor; in the Army it is called the 

CEZR or Comandcr’s Evaluation hcport. Althcugh ratings of this nature 

are susceptible to subJective elements, the design of the reting form 

attempts to direct the rater’s attention to obJectfve observation of 

behavior, 

Pcrformnnce test - fifs mq be in the nature of a work umple such 

a8 a typing test, a tapping test to measure an aptitude such as finger 

dexterity, or a sitoational test in which actual vork problems nre aimsla&d. 

Certain ae?ection and promotion boards can be properly considered 

performance tests as can the selection interview in an employment office. 

This is true when the interviever or board is attempting to evaluate 

b?havlor required in the performance of Job dutiee. 

Evaluation of experience and trnu - In 8ome goverzticnt Jurisdictions 

a scnre is aiven to each applicant for selection or promotion 

bneed upon quality and quantity o.f jbb pertinent experience, iraining, 

and special recognition such as awards received. Thie wore become8 part 

of the examines’s final rating. 

It is obvioue that all Job; require some basic motor activity. Certain 

movements of the feet, hands, eyes, and other parte of the worker’s 

anntsmy are required to reach the L3.srk aituntlon, to position himself for 

the performance of his duties, and to control and manipulate the physical 

tools of his trade. The absolute requirement of some degree of motor skill 

is present from the most sedentary of occupations to those thrnt require 

almost constant motor activity. For our present co?zern, that of testing 

military personnel in their occupational apecfaltiee, the broad spectrum 

of Jobs is divided into three classes. This purely arbitrary taxonomy 

based upon relative importance of motor activities in discriminating 

between levels of Job mastery provides a starting point in the determination 

of the need for A test of motor skills. Class 1. A large group of service 

personnel are required to perform duties that require only the basic 

motor skills, the posoeseion of vhfch can be assumed from their 

cacceptunce into the service. Although M)tor coordination contributes 

something to the efficiency with which they perform their dutles, Its 

importance fs ovtrshedowed by the fmportance of cognitive functions which 

are central to the executfon of their normal work requirements. Administrative 

and general clerical occupations are exsnnplee of this group. It 

appears thst a pcrforoance test would contribute little, if anything, to 

the eveluetion of these personnel. Class 2, A large number of eervice 

. . 

. 

I.28 

, 

c 

! . *,! 

,) 

.

. 

personnel are assigned to jobe the duties of k%lch require rotor nkil.ls 

of 8 mre specific nnture thsn those in the previous category. ThCQC 

skilile till have usually been acquired in one of tha service achoola, or 

in some casea, will hove been poosesscd by the individual upon entry into 

the scrvicc. The motor aspect of these skills might contribute oubstantially 

to the quality of the performence of their job duties, but 

cognitive abilities are of central importance. The ability to smoothly 

nanipulate tools, position physical objects, and perform manual movements 

in the adjustment or operation of equipment is required, but the evaluation 

and diagnosis of the situation requiring these physical skills le a better 

determiner of competence. In this erea, the quart ion is vhether tbe 

central skills can be musurad adequately oy a written test, or should 

some form of performance test such as the performance check list be 

made part of :ho total evsluation. This decision must be made by the 

test psychologist vorking cooperatively with subject-matter expcrtn in 

the epproprlate field. Before arriving at 8 final decision, pertinent 

validation data nnd frequently experimental c’cita must be considered. 

Various types of repairmen, mechanics, and equipment operaeors are exsmples 

of personnel whose jobs are of this nature. -Class -3. 

A COFparntlvely 

small number of jobs exist in !&ich motor skills appear to be 

crucial discriminators between levels of job mastery. These positions 

are typified by duties that involve const.ant, rcpctitivc activities that 

lend themselves readily to qt;an::itlve and/or quolitatlve measurement. 

Horc important than the fact that these job activities are readily 

measurable, is the knowledge that studies of pc. formance on the job 

compared with scores on the typical wrftten tests have often indicated 

that there is little correspondence between the knowledge of vhat to do 

and the ability to do the wrk quickly snd accurately. For joba in this 

category a completely adequste measurement of job mastery must include a 

performance test. Typist and sterographcr are examples of jobs that are 

Included in this class. 

The decision as to bbether or not the evaluation of a particular job 

should include a performace tcet rent6 entirely upon the nature of the 

elements of the job and thelr susceptibl~lty to meaaureneet by a written 

test. This declslon can be made by the test psychologist only when 

complete job analysis lnfornatioa fa available. ‘I’he analysie of the job 

provide8 a liar of duties &ich are required for adequate job performance 

and must serve as a starting point in the determination of the types of 

tests appropriate to constitute a complete evaluation, The actlvftiee 

must be defined, analyaed, and elements necessary for job success must be 

Isolated. This process involves the dlvialon of the job into its basic 

elemental components, a process which some Cestaltisee will find objectionable; 

however, if a particular element of a job fs not adequately 

, 

129 

. 

.

- 

‘. . 

*. . 

;- 

. d 

-- 

. . . 

. 

. I 

_.-. .’ - . . . 

. 

.-- __ _ . ._ 

measured by a wrltton test, we are challenged to find A way to evaluate 

this factor. We can concern curaelvca with emergenta, whan WB have 

conquered the conetituenta. Dasic criteria for job 8ucce~a euch a5 

quantity of output, quality of output, accuracy, spoilage, and eafcty 

factor8 muot be determined. Decfslom mat be roade a8 to whether the 

performance will be neaeured in terns of pro&Act or process. If it ie 

to be product, should WC establish standards rrncnoblc to objective 

ueatiureaent such us size and weight, or will subjective mcasureti such 

aa moothnees, color quality, and aymetry serve as better doterninera 

of competence? If an emphasis i5 placed upon proceae, should we be 

primarily concerned with use of tools, proper work methods, or work 

aequencef The selection of universnlly superior work procedurea f~ not 

@Any. Should a mechnnic vho con consietently analyra correctly a mAlfunction 

8imply by listening to the en@& be penalized because ho 

doesn’t use the generally accepted tool5 and work nethods? Thie ie only 

one cxaarple of the type of problerna to be solved in the construction of 

a p,ood performance teat: but it ie a good indicntor of the obstacle5 

that a teet conetructor must hurdle. 

Aptitude - ~+is is A test of a rotor skill that attcmyta to predict 

auccea5 in a particular activity or potential for benefiting from 

training. Aptitude tests are not Job achievement tesia and are mentioned 

here only because they are one type of performance test and in 8ome 

situation5 can be of considerable value. Ihe distinction between attitude 

end achievement teeta Is not alway cryetal clear. The U&Z! of the ;Cet 

rather than ite n.turc io the important factor, The UBQ of a aubjcctmatter 

teat to predict success in a position of a hi.gher level ie closely 

related to the u5e of the typical aptitude test. 

Achievement Teets 

Work aample - l-hi5 test provide5 the examinee with a typical. performance 

oituation appropriate to the Job for which he la being evaluated 

including a teak or group of tacks characteristic of that required for 

actual Job performance. A work srmrple ie not actually a piece of a Job. 

Some part5 of any Job would yield little atatiatically u5eful variation 

in performance; other part8 might not be adaptable to a teotlng eituatfon. 

A good work aampla muat differentiate betvesn good and poor workcre, rind 

provide 5core8 reflecting degree8 of proficiency. Thie is poseiblo only 

if a fatr 85mpl@ of crucial determiner8 of job 5ucceaa 15 included, requiring 

the cxaminee to demonbtratc hia acquired ukilla uofng the tools, 

maCerfaL5, tnd method5 characteristic of hfa Job. Teats of this type 

have been developed and utilized in the EEC for typieto, stenographer8, 

bandsmen, court reporter6, and radio code receivers. 

Situational performance tests - Thoro test8 do not attempt to pctttsure 

a eimple activity, but one !&rich is rather complex and lea5 well defined 

and isolated than the work sample. A group oral teat in which such 

130

. . _-. .._. -...a_-e,-,. .-Y.-e--.. .-V.-d_ -..w-.... -..-- ._-. - .I. . . --. 

I 

. . 

131 

/ 

.___,__ .__- -. . .._ -... . -.-.+ - .._, .I . 

personality factors as dominance, leadership, judgment, and emotional 

6tnbflity are evaluated is an example of this type of te8t. Probira 

eolving ability ha6 nlso been mesaured in thfa way by prerznting the 

examinee with a unique simulated s:tuation of the nature he might encounter 

on the job and rating hie hendling of the situation. Tne 

process test, which ia primarily concerned with proper vork procedure8 

and sequencea, and the perfonaanct? check lilt are also typec of the 

situational pcrformnnce te6t. Hr. Claude Bridge6 of EEC 5.3 presenting a 

paper at 1330 houro thie afternoon in the Vest Auditorium on the aubjcct 

of perfbmance check lists. Because cf the complcxitiee of administration 

and acoritg, high cost, and the availability of a variety of written 

tests covering the appropriate eubject matter, the eituational performance 

test has not been widely utilized in achievement tearing, 

Performance tests for evaluating murk proficfency should be uded 

only when a group multiple choice test cannot provide en adequate measure. 

In nany instances the mastery of the job con be inferred from the fact 

thnt the individual possesees eufficient kncwledge to perform hlc job 

duties. Certain jobs, which by their nature, are centrally concerned wit-h 

a re;)ctitive, manual activity require a performance teet to supplement 

the written test in providing a complete evaluation. A typist, for 

exzrple, might have. a good knowledge of the various parts of the typewrfter 

and their function, but experience haa shown that this knowledge is not 

highly correlated with typing ability a8 measured by a typing performance 

test or performance on the job. Jobs of this nature represent only a 

relatively small percentage of jobs in the military services. The most 

fruitful area for further research appears to be that of the situational 

performance test, or more specifically, the performance check list. As 

a starting point, more comprehenefve fob analysis is required, job 

element8 predicting GucceGG must be defined, the procedural verauo the end 

product problem must be rccolved, and better method8 of scoring muet be 

developed. The old bugaboos of increnPed expense, greater expenditure 

of time, and difficulty in providing adequate and equitable tett sites 

are still with us. The competitive group oral and situational problem 

solving test6 offer encouragement in evalusting certain hard to meatiute 

pcrsonnltty traits; however, their principal value will probably be in 

ccrt8in specific unusual teat situations rather than in the area of job 

proficiency testing. 

__ . - . . . .- --- --- /- 

, 

.

. 

Perfonnsrnra Teat Construction 


Having sntcbliahad the need for a performance teat, we must firat 

gfve due conrideration to the objectives of all perfo-ncn teat-e. &r 

prime objective is to measure the levels of proffcisncy Fn those critIca 

skilla which are not measured adequately by estsblfshad written teaca. 

Our Becondary objective ia to deattnifne If the oxamincae achieves (demonotrateo) 

skilla which are important, or critical. to mfnfmal parformanca 

on the tarkr of tha job. Thnt Fa to nay, we must identify motor (ccilnurl) 

rkfIlo that are to be sampled by our parformrinca ta@t in chess categories: 

firrt, thora skilla which are critical Fn dlatlngufrhiap different lavalo 

oL job marcery; aacond, those vhtch or-6 eeaantial for accaptwblc performante 

of the motor trek8 of the job, and finally, those skills which cannot 

tanaon@bly be measured by written teats. 

Our first step fo to identify which Job taut8 rdquiro thaern rkills 

previously mantionad. Prom such A very cmprobcnefva study of the job 

tarkb we ~111 be able to pinpoint the actual test trsk. Think of the teet 

task a8 our roed F&p which tall8 ua how to get to where we P~Q going. 

To a~~ure that our analyrfa is complete end accurate, we ur? the 

tack analysir method. Thie cormonly entails the crccompllrhmtnt of a form 

utillxlng fivn columpa. Tha hsgdtngo for those colunne are: (1) ACTIVfZP 

STEPS (what the worker actually doca); (2) PilOCEPURPI (how the taskr muat 

be done); ;3) CAFE AKD USE OF ‘MATERIEL (the toolo and tqufpaent with which 

the tmekr are done); (4) SAFETY AND SPECLAL PR!XAUTIU?-JS: and (5) COXUlSIoNs 

(reeultr of havfng done these tarks, or work samples). Performance tc8ts 5re 

ured for several purpoosn. Providing a criterion measure Fr one goal for 

performance testing in tha anied 8CrviCcO. The UIIQ of the task analysis 

method ir the beat assurance we have for achieving this goal. 

Once thir a~lysir bar been completad, v4 are in a position that 

� nablea US to ~QO the whole picture. Hence, we can more readily identify 

critical pointr such aa, not measurable, not a ratable point, or to 

label A point a8 vary critical. Actually, the taok anslysLa (S~Q prefer 

“vork uamplr”) bacomca :he OOUTCQ of items, ratable points, for 0p.r 

chxperimsntal inatrumant. -Further, thir analyrir qrortly aide o u r aelection 

of the correct type of test inotrument. Will ft be a final product, 

a procQII, or even a ccmbinatfon of product and.p ocerr test? 

132 

I 

- ,. ._......____ - .__- ---- 

.

The octond 6reA of opercatfon’ for performance tetrtero le the conetruction 

of the experimental instrument. Au we @tart the proceoo of 

&electing rateble points bared upon our AcYI~sIE, there cara et leeet 

three criterle which should guide our selecticn. The firot criterion is 

reprerantativenecr--the activitice, being measured are rcalietic (joblike), 

typicA of the taakr and okillr performed by A quAlifi.ad apectatiat 

on the actuA1 job. Performance tenters Are frequently tempted to aubatitute 

unraaliattc talka, for a variety of reasona, and often cacrifire 

all cr almat all of the criterion, repreaentativenasa. (3318 10 not A 

fccetioua refarence to honeat recognition teats or aimlated conditions 

tcrte.) Our second criterion le reliability--conaiotent rerulte yielded 

for a rerpecteble range of acoree --enough so that a standard can be 

developed for world-vide wage. Third, va tast conatructorr muut be 

mindful of the criterion, prcctlcality-- feaoible to u&e in tome of the 

time, equipment, parosnnel. und ex?enrc to adminirter and to @core. 

Raving relected the “itaml” for our eicperimentel instrument in 

accordance with the three criteria ve muet next develop the rating 

scale needed to evaluate the tasks to be maarurcd. Banically, tie hsve 

two typeo of acelee frcxs which to chooat+: the forced cholco;go/no go, 

did/did not or the degrees of skill, vhich can take any of oaveral forma, 

Auch � 0, numerical (1, 2, 3, 4, 5): deccrlpttvo word scale--for exempia, 

poor, good, &VOrAg%, etc. A poerible third acelo, phyrlcml characterirtics 

o f the final prodr.:t, ie aomkthea uoAd. Remaber, i t ir eaey 

to go overboard with numericel ocal~)s, AI K. L. Ecan (1953) pointe out. 

Only thooe rater8 vho are exceptfonally well qualified ohould attempt 

uoing A rcale of more than five pointr. The nature of the items of our 

inotrment will largely dictate which ocale ir more appropriate. A 

word of caution at thfr point Fe in ordar. We tmut ba careful to piAn 

porrfble weighting procedurer vlth an eye to chcckfng them egninrt our 

crltarle. Ao A rule of thumb, complicated, involved weighting rhould 

be avoided. Hany promising teats have been invalidated by cumberroma 

weighting. 

Our next mc\Jor area of activity f0 the preporstion of inrtructionA 

for the axamlnerr And the examtneer. Basic contsnte of the exaninee 

fnrtructionr should inc?.ude A deocriptlan of the tank to be done, A lirt 

of the pointe of the tetik that will be scored, all the tool8 and/or 

equipment to be used, � �� lkaft (ff applicable) and any other necclaoary 

lnatructionr. 

Huch more detailed inmtruc:ionr are necessary for our examinerr. 

Not only murt we cover the darcription of the taako, ths materiel list 

And time limlta bur alro tip8 regarding careful obaervatfon of tha 

examfnea’r ptrfomnce--eopeciAlly prwetior?t for 8 btep vhich fr, or 

could be difficult to obeerve. Further, theae inotructionr must be 

explicit regarding the recording of reruLts with the rating scale. we 

P 

_ _._ -_.- ._-._ - _-. --- 

,

. 

. 

kxmws hat’ to do if the axamin~a makes 

, OR undo en alternate method, OR etarte 

ult in fnjury or dmz.aga to equlpmsnt. 

hi8 quota from Adkinr, at al (1947): 

capcafed by axplicit inotructfonr to the 

ratinge and what to look for.” 

f. The next activity in ths expsrbneata’l 

ctore uoe the etandord teat-reteot tachnique 

E fnrtrument. If applicable, rateeting 

a helpful. A critical point to rcmmbor 

arga -age of examinean--from Lhooe who 

s to the most capable onea. (Ideally, 

es would be divided into two groupr-y.) 

We muat be aura to check and rachack 

ru of our instrument. Rcaearch haa shown 

ter8 can pay big dividends in .the area of 

ility. 

our tryout include hslp in doter-mining 

the tert; holp in datewining the mfnirxn 

sskfng for the Amy. wa would be helped in 

:-rrder’s Evaluation and the ET (written 

tryout rorsults most ruroly will lead us 

the inrtrumont and, poaoibly, even eme 

tryout takear us a @lentic step toward 

10 into gripe with rcalf*tic analyefe and 

Thir necerrltates coneiderable coordlnatfon 

:ara end field pcreonnel, not to mention the 

I. The purpore of this thorough coordination 

reffne our efxperFmsnta1 Fnotrument until 

, reliable tast inatrumant. We will have 

!dmLnlrtrotions of the official instrument 

knd elfmfnats problane not pravfourly antlciidard 

forme, 0coritlJ keyr, manual8, etc. 

.rtfcal analyrLs of teat renulta In order 

Fnstnment for use on a world-vida baafo. 

. 

H. L., Brfdgeo, C. P., & 

of - --I echievment me tc@te. Waeh- 

; Offlca, 1947. 

i peraonnsl tests, Wew York: 

uant. iiarhingtoa, D. C.:

O&#&RAL%YWP011U0111l-SRRS.GE~SEVIF.VESCHULTER,CEIIAIRB;1AW 

us HAVAL EXAbllltZ:Q CEIPTER 

One Interpretation of the Mjor 

Coals of Specialty Knowledge TcatirLq in the 

United Stateo Afr Force* 

ST8P¶iEN w. FcmS 

6570th Peroonnrl Renearch Laboratory, US Air Porca 

Kaybe itDo an occupational disease afflicting home of ua who rbork on 

the aeocmbly line grindit% out the tests. But it tiometimee happens that 

we get so wrapped up in the daily routine of the job and BO preoccupied 

with the short-run goals that we’re apt to lone sight of the major goals of 

the task. We bscoam no involved in the workaday methods that we find it hard 

to urecramblo the enda from the means. So, from time to ticnz, it is worthwhile 

to climb aloft and take a fresh look at our referenca poit?ts and renew 

our perepective. It ie occalrionally necessary to forget the mode of travel 

and concentratr on the destinntion. 

IN SUPPORZ OT THI! A?CRMN CXASSlCFIC%TXON SYSTEK 

The expreesed goal of the Air Borce specialty knowledge testing program 

la to evaluate the technical knowledge possesued by nirran a8 required for 

qualification under the Air Porte enlisted personnel classification eyatem. 

Towards thie p,oal, the Specialty Knowledge Test (XT) is provided aa an Air 

Porce-widd etandard of measure by which to determine job knowledge, apart 

from job performance as 8~1~. The SKT is applied not only laterally by 

career specialty but also vertically by skill level--namely the apprentice 

or eeniskilled, the journeyman or ekilled, and the advanced levels of qualification. 

Ae a criterion for skill upgradinS, the SKT ie intended to supplement-but 

not supplant--other criteria, such a8 demonstrated proficiency on the job, 

job experfence and history, supervisor’s recorrenendation, and coaxzander’a 

approval. 

upgrading. 

Thus the SKT is by no mean8 intended to be the eole criterion for 

Toward8 managerial control. In effect, the SKT aervea a8 a managerial 

control device whereby Headquarter8 USAP ie enabled (1) to ensure that the 

airman manpower resources meet the oetabliehed minimum requirement6 in terms 

*It is emphasized that this is one individual’e interpretation. Thio u83er 

doer not neccssartly reflect the official policy of the Air Force. No; hoes 

it necessarily reflect the position, whether official or unofficial, of any 

major air commend. Appreciation Is eepacfally due Lt Co1 Albert S. Rnauf, 

USAP, for the stimulating dialogue that evoked many of the observations noted 

herein. 

. 

, 

136 

L 

_ . -- ._ 

i 

_... * , 

f

. 

. 

. 

of technical knowledge, and (2) to accwlish a rza~ure of sttndsrdtrat?on cf 

airman knowledge on 6n Air Force-wide boeis. 

Oualitv control 

_L - - • It is axioostic thut knowledge ie power. Eiobherc is 

the truth of thin axiom better founded than in a tmdsrn military organlrstion 

I in which the succc~o of the nfseion depends on the qualitative superiority, 

r&that than quantitative strength, of ire F;E.npower, Knowledge, then, may 

be regarded a- a form of resource. Like t’lZteria1 reBturce8, technical knowledge 

, io subject to deteriorntlon, Leepletlon, and obsoleacenrc. To be n-aintaiced 

i.n a conetant state of readiness, knowledge rnuot be continuouely reneved, 

restored, and cultivated. Unlike material reBource6, however, knowledge is 

not readily mnable to rneneuremcnt for inventory purpoaee. Nevcrtheleas, 

througi. the SKT program, the Air Force seeko to maintain a slose check on 

the job knowledge reaourcea that reside in the enlisteJ nanpowcr population. 

Gtandarditation: one AP language. 

-I Through etandarditatfon of knowledge, 

the Asorce seeks to neutr$tllzt the hazards of specialization and division 

of labor. One hczard is the propensity of each organization todrvelop its 

own concepts, its own doctrine, ita own private language of epeclalized 

teminologv and nomenclature. One could fancy the crnergence of an Air Force 

tower of babe1 as the thec,retical ootcorae of this tendency, carriei. to it6 

ridiculous extreme. However, it should not be unrealistic to credit the 

SKT with making a notable contrfbution to the CLPUEQ of standardization of 

knowledge. Acting aa a vital stiolulant to the currency of a coamn technical 

language, the SKT helps keep the Air Force family of comorande on the cm 

wavelength for purpoeee of concnunication. 

Standardization: the “c~leat” air.-c!an. Another hazard is the tendency 

for each mjor air con-x to foster that job infornntion that la directly 

relevant to it8 own mission, to the virtual exclusion of broader areaa of 

knowledge having wider applicability to the general Air Force miseion. This 

is understandable inasmuch as each command ia under relantles8 preaaure to 

meet the demands of ite irrnediate mission. On top of their many burdene, 

the commands bear the donkey’s share of tralnlng burden. If an individual 

does a creditable job towards the fulfillment of the conmvlnd mission, the 

commend will naturally tend to want to overlook any knowledge gaps that might 

limit the individual’s potential value to the Air Force at large, As long 

as he ia a good SAC man, who careo if this man could ever be of any earthly 

or airborne use to MATS, or vice versa? Well, Headquarters USAF cares, And 

. so should MATS care from the very standpoint of ite own long-range intereete. 

And 80 indeed should SAC. For the atrman’s breadth af knowledge stamps him 

at once with both his professional and hir, Air Force identity. It is the 

. mark of his versatility and employability within his specialty. l’hiti quality 

in the airman largely relieves the Air Force of the need for retraining him 

extensively with each change of assignment--whatever the command, whatever 

the fob, whatever the specific nature of the equiprent involved. 

* 

137 

.

. 

Systematic upgrading. At the same time, -the Air Force achieves other 

gains through the use at the SKT PS a managerial control device. By channeling 

the upgrading proce56, tile SKT regulates the flow of technically knovledgenblc 

personnel up a host of career laddere. Rclpero, apprentices, journeymen, 

tcchnici6ns, 5upervisor8, superintcr&nts--all arc kept advancing at on orderly 

pace, each group maintaining the preecribed distance of feparetion from the 

othere, A kind of braklng mechanisu, the SKT kcepo ampetition from disintegratin! 

into a chaotic ecramble and providcn a hedge againot the runaway Inflation of 

ratlnge. A sort of traffic control system, the SKT set8 rz&xm speed limit8 

in the form of qualifyfng percentile scorea. 

Wotwithetandfng limitations. To be But@, the teat doe8 not have the effect 

of sunnnarily and irrevocably eliminating in drove8 failures from any furthercompetition, 

On the contrary, through reteating and board action, all but a 

negligible proportion of the failure8 eventually get by. Fhat ie important, 

howover, 18 that the test doe8 serve to keep the rate of progression within 

manageable limit8 for purpose8 of effectrve personnel salectlon. And to be 

sure, one can envisage a c-are flexible application of speed llmite, depending 

upon what the traffic will bear. A rather permis81ve puss/fail ratio might 

be justified for a critically undermanned specialty, or a very restrictive 

ratio for an overcrowded one. However, :.xh manipulation to accomdate variable 

supply-demand relationships azxng rpecialtics would hevc to be carefully conoidered 

in thr. light of poaeible conflict with the goals of 5tandardfzat:on and quality 

control, 

All told, the resulatory effect of the SKT function cannot be denied. It8 

value a8 a absnagerial control device ie appreciable, elrpecially when a88eEsed 

against the alternative of no control. 

k’or the good cf al.l. The gains achieved by the SKT program are by no 

means confined to the hig!lest level8 of management. The beneficieries of the 

program am found at all levels right down to the individual airman. Couxnandere 

and 8upeTvi5or5, faced with mekfng selections from amxtg relatively homgeneous 

group8 of personnel, find A trusty catalyet in the SKT, helping to ease the onus 

of decision. Heanwhile, the individual airman is provided with an objective system 

of career progreoslon, giving him the opportunity to compete in a eervfcewfde 

arena. Under etandard conditions, the testing situation Afford8 him the chance to 

demonstrate anew his capacity, in term8 of specialized know-hgFI, for advancement 

in level of responeibility and authority. 

IN SUPPORT OF TRAINING . 

Thus far,the discussion has dwellod mainly on those goals of the SKT program 

that relate to the use of the test a5 a managerial device in support of the airman 

claeslfication system. There are certain other goals of the SKT that produce an 

impact on the training function. 

138 

.

# 

___. 

- 

: ..\ \ 

c 

. . 

I 

. i 

I 

1 i 

. 

. 

. 

. 

. 

Screening byppasa cpecfalloc~. The use of the SKT for the identification 

of bypass specialists is ohe of these goals. A recruit, &IO has sme ~acialty 

backgourd derived from civilian ocoupational experfencc or schooling or fros 

prior military service, azy t&e the SKI to ascertain his qunlificntion in the 

spec‘lal ty. Cm passing, he is svarded the smtskflfed rating,. Every year the 

bypass-specialist program yields significant savircge in term of cfrc-umented 

trainfug costs. 

Hotivatinq study. There is another goal of the SKT progrran that brings Lt 

into an affinitive relationship with the training program. This goal is i-licit 

in the publication of study reference lists for the guidance of alrum in preparation 

for specialty testing. The effect is to motivate study on the part of the 

airman-to kindle his urge to acquire the technical knowledge that is considered 

crucial to his successful pcrfo,rmance on the job and to his long-range career 

development. 

Lrpact on Iraining. As a consequence, preparation for the test become big 

businese. &my people get into the act at all levclo. Training program go 

into high gear. Tne impact upon training is felt throughout the Air Force, but 

nowhere is it felt more acutely than at the Air Training timand, In turn, the 

latter undertakes to produce training standard,0 and job informtion for b.ir Porte 

publication. r;ot only are these made available to tha traLnee for his use in 

preparation for the test, but they ate also adopted by the XT progrfun for input 

to the test-construction process. 

A r.c*~ epproach to CUT. A noteworthy outcome of the interaction between trainin;>. 

and evaluation has been the recent adoption by the Air Force of the dualchannel 

concept of on-the-job training. This approach to CUT. provides for the 

synchronous development of the airmen’s cereer, on the one hand, and his job proficiency 

on the other. The end product of one of these training chaonels is the 

Career Development Couree (CCC), which is a self-study course geared to the 

aiman’s specialty for his use in preparing for the next higher skill level. 

Since it is a self-contained peckage of career specialty information on fundamentals 

and basic principles, the CDC is a welcome source reference for use in 

SKT construction. Thus has the SKT fulfilled itself, in part, through ite 

salutary *act on training. 

The isaue of fndependant evaluation, So salutary, in fact, has bean the 

two-way interaction between training and evaluation that, for practical purposes, 

a sort of symbiotic relationship has emerged betveen the two. The intimacy of 

this relationship has been the subject of considerable interpretation and, possibly, 

overinterpretation. 

-I_ The ca6e against. One strongly voiced interpretation holda that evaluation 

should be an integral part of training. Accordfng to this view, the SKT program 

would logically be assigned to the Air Training Cormmnd. It ie maintained that 

the closest possible coordination 1s needed to effect greater efficiency and 

economy of test production. Such a wedding would eupposedly enhance the mutually 

supporting relatfonehip between the two functions. Thus a higher degree of mutual 

responsiveness would become possible In a more intimate association. 

139 

; 

i 

,i

2 

. . 

. - 

“.,.>__ ., . _ _ _ _ . . . . . _ _ - 

4.h.. ._ _ Tha caBe for. Another view, the preveillng ona, is that eveluation should 

ram&in independent of treining. Evaluation, it is contended, is properly a 

coordinaea function vfe-a-vie training and should not be relegated to a s~ordinata 

role, To do so%uidx BornewhAt like Qlscing the bar examiners under the 

low-school faculty, the teacher-certification agency under the normal. school, 

or the Auditor under the bookksepar. tiir would ignore tha nosd for checka and 

balances between training and evaluation A% indcpend-antiy coexisting functions, 

As an evaluative instrument, then, the SKT is supposed not - -to 

bs a form of 

training de-rfca. To regard the SK? 8~1 euch would supposedly be a groea nioconception 

of it6 true function, even though the SKT dcee have the extrinsic 

effect of operAting As a ntudy LrcpeIler. 

I a’ 

/ 

* \. 

.’ 

, 

-- 

c 

According to its Qroponents, evAluatfon is better able to respond flexibly 

to rapid technological developments with the Introduction of new subject matter, 

And to cope with the sudden obsolescence of old subject matter, eince it is A 

far simpler mtter to revise a test than to revise a training courne. To submerge 

evaluation in tralnfng, it is pointed Out, would impoea upon the former 

the inherent drsubacke of the latter. It in considered cruciA1, therefore, 

that evaluation remain directly recrpooet.re to the clsesificntion stclndards 

rather than become elavishly dependent upon the errining standards. As for 

the ecorrcmies anticipated with the pro?oaed consolidation, it le believed ehicl 

would have to be calaulatsd in term of overell cffactivenecs rather than 

purely Dionetary units. The sacrifice of IimSted pecuniary aavi~~ge is be-ieved 

to be a relatively cheap price to pay a~ part of the coot of independent 

nvAtuAtion. 

Theee two opposing positions retative to independent evaluation form the 

horns of a dilemm that periodically rear into view. ‘What’8 the eneuer? 

Whatever the Ansver, it mst first be recognized on both aider that tha 

qussti.on iteelf is recilly neither one of tralnirg nor of evaluation in an 

exclusive eense. Rather, it is a question of rcanpower utilization in A 

corcpreheneive renee and probably should be Approached Aa such. Whatever the 

answer, it should be identifiable neither as training policy nor a8 evaluation 

policy but as a amnpower management Qolicy. In short, it should be an Air 

Porte anmer to an Air Force question. 

Surnnary. And there It ie-one vereion of the mjor goale of specinlty 

knowledge testing in the Air Force. Ye have reviewed the goals of the SKT 

as an instrument of management in support of the airman pereonnel classification 

ayetern through quality control and standardization of knowledge and through 

systernstfc career progreeeion. We have alao reviewed the goals of the SXT 

in relation to training, not only to energize study but also to screen bypass 

epacialiste. Finally, we have scrutinized the icpace of ehe SKT on the training 

program And raised the question& independent evaluation, In 00 doing, we hnve 

euggested that, whatever the answer, it should be transcendingly Air Force in 

it8 spirit. 

. .

. 

, 

. 

. 

- 

. 

. 

Uses of NOS Evaluation Teot Results 

J. E. HMREITM 

US Army Enlieted Evaluation Center 

Dr. Bruner, a mathematician and geophyoicist, more renowned for hie 

contributions to hi8 fields than by the number of his publications, once 

asked his students to differentiate between mathematicians and the colculur. 

After a few minutes of their profound, evasive silence, he pointed out 

mathematicians use calculus to solve problems which neither can solve 

alone or which mathematicians can laboriouely solve with lees efficient 

methods. Test6 of occupational capability, like the calcu?u.s, are tools 

which can aid in the solution of per?onnel management proceasea only when 

they are capably used. TeGt users, like mathematicians, can more 

effectively solve their perconnel evaluation problems &en they make full 

and appropriate use of the tests available to them. 

The fir-et Army HOS Evaluatior Tests were edmlniotered in January 1959 

as a basis for the award of proficiency pay. They were called HOS Proficiency 

Tests and, for the most part, s-led the abil?ty of enlisted 

personnel in grades E-4 and above to recall the fundmcntols of their 

primary HOS training. The raw test 6corea vere converted to Arm; standard 

scores, weightnd, and added to weighted Commander’s Evaluation Report 

rating scales co provide a composite “proficiency score.” The te6t acores 

. .., and proficiency acores were reported to the enlisted personnel concerned 

and their unit personnel officers on a form entitled “Proficiency Data 

Card.” Summaries of teat results were furnished to Headquarters, 

Department of the Army, and major cousnands. The minimum score for the 

award of proficiency pay for each Hilitery Occupational Specialty (!%3S) 

was determined by the training requirements and attrition rate for the 

MOS and the number of proficiency payments that could be made. Lists 

of minimum proficiency score8 were distributed throughout the Army for 

corrnnanders to use as a basis for individual proficiency pay awards. Ae 

you can readily see, the first MS Evaluation Tests (MOS Proficiency Teets) 

were used in two vaye: They were used by cormmurders and unit personnel 

officers to determine which of their enlisted personnel met or exceeded 

the minimum rcquiremcnts for the award of proficiency pay. The tests were 

also used by enlisted perrronnel and their officers to estimate how they 

ranked with all others tested in their MOS. 

Current trends in the Army MOS Evaluation Program are tovard increaeed 

emphasis on: sampling enlisted personnel’s LJilitieo to solve job 

problem, multipurpose scores, and improved reporting of results. Becauee 

of the wide variety of specialties in the Army and the large number of 

personnel tested, realization of these trends is necesearily more complete 

for some NOS than for others, though some progress has been made in all 

141 

.*

. 

- 

. . -.. ..- . .-. .._-,_.-&s -..-._..-..... a-.-r-..< .I.- -w--w ,, -- . -_ ..-. . ..- _ -...- 

nreat3, Aa new ldcns arb developed, or m6thoda of adapting old fdeeo are 

visualized, lmnediate goal5 .wi.ll ,ba extended and trcndb: m’dified accordtngly,’ 

Since the account of things to came. mat evolve from the pteeenc f the scope 

of thie prcaentatfon io limited to currant hippcning~ with a faw indices 

of next steps md anticipated hazardo. 

The emphaafr in teat covarage haa been shifted from trafnfng materi 

content to the problemc encountered by pereonnal senlgncd to the HOS 

between their graduation from formal training md their prowtforr to in 

advanced ski?.1 level or MOS. Par HOS in which cepcciaily needed (Eandsmen, 

Typists, S:cnographere, Court Repotters, and Radio Code Operators) performonca 

tests of motor and sensory skills have been developad for 

standard adminietration world-wid6 and are used a~ supplements to papcrand-pencil 

tests to product more job-related composite ecorea. Job-semple 

problems, adapted for multiple-choice anawerfng, huv6 been developed and 

uRed in aeverel testr. (These problems are bneed upon repreaentativa 

aeaignments for the specialty skill level with coannonly encountered 

aituctlonal data presented in nnrrattve form, or recorder tapce, or 

drawicge, fully or partfully completed forma, or ocher vieuel meana. 

Representative eolutfono of outstanding and infer-lor specialiats are provided 

for choices.) Several test outlines have been radevelopcd along 

f!:nctional linea to produce eubncores which reflect the comparntfve 

abilitiee of examineee to perform the various duty pcnitionp within a 

sp6Cialty. Atl can readily be seen, the net result ie a ronvsreion of the 

primarily job knowledga tests developed in th.1 early stages of the Army 

Evaluation Program to more predcrinantly job-pI’oblem eolvfng abilitiee 

tests. When the converafon ie complete, all KOS Rvaluatfon Teet. ecores 

can be used with confidence to determine how well examfneee ten perform 

the current mafor dutiee of a military Bpecialty rat.her than how much the 

examlncoe knou about the fundamentals of their KOS. 

Tvo additional index scores ere developed and reported: the NOS 

qualiflcatfon acore and the promotioa qucliftcatfon acore. The KOS 

qualification score le a *minimum passing score used to determine vhich 

sxtminees should be retrained or reassigned to P mor6 appropriate 

rpecielty skill levsl. The score can be used to determine which examfneee 

with a critfcal primary MO5 will not be awarded the sllpplemental 

pay for their assigned opeclalty. The method for determining the HiXi 

qualification score is bacied upon the premise that experts can arrive 

at the absolute number of question8 in e teat which, if answered correctly, 

distinguieh between the minimally qualified and unqualified pereonnel 

aaefgned the H03. The Promotion Queliffcat:on Score nay be used by 

commandera, if they desire, es a requirement for the advancement of enlieted 

personnel within their commend to a higher pay grade or ckill level. 

It is the score attained or exceeded by one-third of the examinees arai.gned 

in the same pay grade in the HOS skill level teeted. Where such comparfaone 

142 

_.-

- . 

. 

‘.. 

! 

_ 

. I 

1.). 

-- 

. 

.,2 

‘. 

L’ 

. 

. 

: 

. I 

..- 

. . 

\ 

_- 

.’ 

‘s . . ,, 

‘. 

i 

c * 

: 

4 

\ 

/ 

:’ 

. 1 

. 

. 

were made, this score comnly 1.~38 In the range of avcrcgr! wares made 

on the s&ne test by soldiers in the next higher pay grade. 

The Proficiency Data Card has been replaced by the WS Evaluation 

Data Report which, In addition to the MOS Eveluotfon Score, reflect8 on 

a five-point stole the proportion of the itcza in each QajGr test creil 

the examinre answered correctly. NOW, all exazzfnees and their 5upcrvioors 

have available a medium for determining how each excninee’a compooLce 

Bcore (test + commander’s rating) compare5 with the Bcores of all other 

Active Army exminees with that primary HOS 61~111 level and how each 

exazzinee succeeded or failed in answering the questiocq related to the 

major teot areas. Examinees can and should uBe their reported test scores 

88 a guide when preparing for retesting during the next scheduled evaluation 

period for the NOS. If they concentrate on improving their skills In the 

areaa In which they answered the snaller proportions of item8 correctly, 

they will broaden their overall MS ca-pahflltLes, and usefulness to the 

Army, more rapidly and efficiently than they can by divcrsftying their 

study cfforte. Supervisors and unit coxnanders can also identify the 

test areas in which their subordinates answered the smaller proportions 

of Items correctly by reviewing and summarizing the XOS Evnluction lIeports 

of their oubordlnates. The results of their reviews can and chouid i.! 

used to plan their training programs and training emphasis. 

Each major coz#nnnder is provided smary report5 of the teat acorcs 

attained by the enlisted personnel tested within his coumarnd. These reflect 

the distribution af the MS Evaluation Scor.e of the personnel tested 

by the comnand distributed according to NOS skill level, pny grade, and 

principal coaznand subdivisiona. From these reports the major conmondera’ 

staff can determine: (1) the numbers, locatione, and pny grade5 of personnel 

tested who failed to attain the minimum score for the avnrd of a 

verified primary HOS; (2) the numbers, location, and pay grades of personnel 

teeted whose scorea are in the upper third of the acoree of all personnel 

assigned the primary WL!~ skill level tested; (3) the numbers, locations, 

and pay grades of the personnel who may be awarded specialty pay; (4) the 

numbers, location, and pay grades of the pereonnel who may be awarded 

superior performance pay. Theee reports supplement the strength reports 

of the command by providing information concerning the capabflitfee of 

personnel assigned enlleted specialties within the conmwd( The staff 

is no Longer limited to reports that there are X men with specialty Y 

In the command, The command twmary reports add: A of the X men are In 

the upper third of all of the Y speciallets In the Army; B of the X men 

failed to cylnlify for a verified primary HOS; C scored above any determined 

more point; the average score of personnel assigned to a given specialty 

ekfll level was , etc. Such data can be used for estimating training 

needs, locating specialists for critical aaeignments, and related administrative 

processee. 

143 

. 

i

During the post year, the fiOS Evaluation Tisting Program has been 

extended to the evaluation of the ability of Active P.my personnel to 

perform :he duties of their sccondery HOS and the ability of Reserve and 

National Guard personnel to perform the dutiec of the N0.S for thefr current 

duty positions. Uhile the fame tests and reporte are used for thece new 

purposes, evaluations are beaed upon teat acores without courrander’e 

evaluation ratings and are used only to identify those exmtneee who 

failed to attain the Active Army minimum qualifying test BCOI’C for nosignment 

and training purposes. 

fumy l?OS Evaluation Tests are also used in determining the pay grede 

and HOS of commissioned and warrant officers vho intend to enlist cr reenlist 

upon termination of his current active commissioned or varrant 

officer service, Eligible off tcers are permitted to take aa manv as three 

HOS teat6 appropriate to pay f rade E-5 or their prior enlisted temporary 

grade, if higher. At least c!-e of the tests must relate to an HDS which 

hue vacancies at their reqtleh!ed pry grade. None of the tests may involve 

an overstrength MOS. Finol tictcnninntion of pry grade and MS ia mnde by 

the Department of Army Grade Determfnetion Board based upon the teat 

acore6, cotprlander’s recommen;ations, and other pertinent, available 

data. 

In each of the current gpiicatl.ons of ;.rmy HOS TeLting Programs, it 

is the connander who must take the final action. The cocszander can give 

an enlisted man tiho fails to make a passing score and acquire a verified 

primary HOS a second chrnce to improve his skills before the next 

scheduled test session ior the HOS or reassign him to a more appropriate 

MOS and okill level, but must reclassify the man who fails twice, The 

connander can withhold Ruperior performance or opecialty pay from personnel 

who attain an eligible score if he determines the individual is not currently 

performing in a satisfactory manner. And major commandera may restrict 

promotion8 to those whose scores fall within the upper third of the acores 

for their specialty skill level. 

Several have euggested that MOS Evaluation Teat reeults be ueed to 

evaluate the effectiveness of service school training. At first glance, 

the proposal appears to have merit; but closer inspection reveala crucial 

f allaciee. In the first place, eervice school training is ordinarily 

designed to provide enlisted personnel with the basic vocabulary, theory, 

methods , and procedurea that enable them to begin their on-the-job training 

at their first unit of assignment. However, HOS Evaluation Teats are intended 

to cover the period from the completion of on-the-job training to 

advancement to a higher akill level or MOS. In the second place, the 

students vho fail the school NOS course are rarely, if ever, aoaigned the 

NOS where they might succeed or fail depending upon the degree to b%ich 

school training met on-the-job needs. In the third place, many of the 

144

enlisted personnel evaluated, particulnrly senior tcc!lnicfans and noncommissioned 

officers, have not attended a service school course for many 

years, if at all. Consequently, much of the material acquired during 

service school :raining by such personnel b.>uld not be covered by teuts 

restricted to curxent doctrine and matcriai. It should ail&o be obvious 

that the highly-motivated, rapid learners k&o do uell in school courses 

usually perform better on the job and NOS tests than slow learners and 

those not motivated to make the Army their career. Any correlation 

between schoo! grades and MS test acores in influenced by the quality 

of school training and by lear,ling abilities--motivnticn factore. The 

degree to which positive correlations are increased or decrascd by the 

quality of school training and its relation-hip to MOS requirements 

cannot be determined from the school grades--test score correlations 

directly, Additional studies would be required to determi:;e the extent - 

and direction of the influence of training upon job success, While 

there are probably other reasons why MOS Evaluation Test scores should 

not be used to evaluate the quality of service school courses, the four 

reasons offered suffice to negate the proposal. 

Others have suggested HOS Evaluation Tests be used to determine 

whether persons called to active duty in a mobilization may be assigned 

directly to a unit or whether they require preliminary training bc fore 

assignment to a specialty. The Reserve --National Guard testing program 

obviates the need for retesting of members of Reserve and National Guard 

units who will be assigned to Active Army units or issued current Active 

Army equipment. But HOS Evaluation Tests cannot be used to determine the 

abilities of reservists, guardsmen, and draftees to operate, maintain, 

and employ the limited-standard, demothballed ,laterial ubed in situations 

requiring major mobilization, because MOS Evaluation Tests cover only 

current, Active Amy doctrine and material. It would be necessary to 

develop additional tests of the abilitles of exeminees to perform the 

specific duties of the additional specialties to be used in the mobilized 

force and to establish minimum standards of performance for those 

specialties to determine which examinees require further training and 

which could be assigned directly to a unit. Tests of basic theories and 

principles would only rate the abilities of examinees to learn--not to 

do. It is possible that time limitations in major mobilizations would 

preclude the development of adequate placement tests and that the benefits 

derived from such a program would be less than those resulting from the 

refresher training of those who pass the tests. 

. 

Some have also suggested MS Evaluation Tests can be used to predict 

how well an examinee will succeed on the job. That is, how will his peers 

and supervisors regard him? Certainly, one must agree that such a goal 

is desirable, However, the question is: “Is it practical at this time?” 

145

I-- 

, 

c 

Now, an MOS Evaluation Test. rates examinees’ ebilitiec to perform the 

full scope of the duties of an HOS skill level, weighted to fit the 

objective3 and concepts of relative importance of the standards setting 

level of the Army pro&ream manager3 staff. In other words, the testo are 

biased in favor of the broadly-skilled examinee rather than one who is 

exceptionally skilled in a pnrt of his HOS skill level reqafremcnts. 

The test score indicates the exazinee’s relative standing among all 

examineee ansigned the MOS skill level. If peers and supervisors can 

adjust their ratings of exr=,inces to comyenaate for their personal 

bioaes and influences, if they thoroughly understood and accepted the 

objectives and important concepts of the standards setting level of the 

program manager’s staff, and if they could simultaneously and comprehensively 

evaluate the abilities of all personnel assigned the primory 

MOS skill level, they would probably rate the exnminee as the test rated 

him, if the examinee reacted to them as he reacted to the ceet during the 

evaluation period. Ohviot~sly, peers, auperviuore, and exainees interreact 

differently, differ in their opinions a8 to the relative importance 

of tasks, have varying degrees of understanding of the specific objective3 

of top level standard setter3, and have rarely, if ever, had the opportunity 

to evaluate the abilities of 311 persons assigned to any MOS skill level, 

let alone nimultaneously. The test3 do provide.a basis for predicting how 

well an examinee could do on the ; Jb when permitted to do the whole job 

rather than a oubspecialty and when motivated in the acne degree and 

direction aa he was at the time he took the test. A greet deal more information 

concerning his interpcreonal relationships and attitudes along 

with those of his peer3 and supcrvisor3, their rating attitudes 6J d 

abilities, and the dynamics of the group would bc required before one 

could predict Low an examinee would be regarded oy his peers with 

reasonable accuracy. 

A few item writing and technical publications writing groups have 

cooperatively used test question response data as a cue to whether technical 

material has been dietribuced to and understood by examinees. With 

few exceptions, the test question analyses developed by the Enlisted 

Evoluation Center reflect how all of the personnel in each MOS skill level 

responded to each test question. Items relating to equipment or doctrinal 

changes are carefully reviewed when only a small proportion of the CXaminecs 

answer the questions to determine whether the queetione or the 

cxaminees are deficient. Checks are also made to determine when guides 

and training materials were distributed to exeminees and whether these 

material6 require clarification or amplification. 

Many will envision other u3es of PiOS Evaluation Tests. In planning 

the uses, one must bear in mind and insure: 

a. The te6t content and objectives are compatible with each 

planned use. 

trolled, 

b. 

All pertinent variables are identified, evaluated, and con- 

____ __- .--.--- -- 

146 

-.-_- _._. _ ..-.._. - 

- - -_ - -.

, 

. 

c. Test-ocorss which only reprtaant uslotive rtandlngr of 

%x~~~fncer ore not logically summed, multiplied, aubracted, or divided 

by ordinary arf thnotic p~xmmt1s. 

d. The cmponmto and wcighttngo of cmqcsbta ecoreo era 

cczqatible with the plantlad u8e. 

e. A teat deeignusd for one purpose cannot noceesarily ba 

used fox whet appear8 tc be n related pcrpoee. 

147 

;; 

^f

f 

. 

Job P.na1ysi.r for Tart Development F’urpozars 

Prank H. Price, Chsi-n 

US Amy Enlisted Evaluation Center 

Today the grmt need in mfXitary teiting -- perticultrly job proiiciancy 

evaluation -- is a sound understmding of the baeic job. Gnly 

if the job to be evaluated lo known in all of itn detailed charactoristics, 

can teata be developed to adequately measure euccess in the job. 

fn other wrdn, we naad information about the job in vhich ue hope to 

ta8t proficiency. 

Job onalysia la a proteus of obtaining information about jobo. While 

fob an~1yri.s can serve vorioua uoeful peroonnel purposes, the most Important 

one for our consideration ia that of retting pereonnol rpecffications 

required in a particular job. For tent construction purposes, a job 

dercription is the end product of job analysis. It ir vital that the job 

dercription be complete fn every detail. Success or proficiency in a job 

cannot be properly wetueted unless tho nature of the job is fully knaan. 

There era a ambar of different agproachnm to enalyxing jobo and 

writing derr:iptiona of them. The papcro which Dr. Horah and Nr. McBride 

will present this morning mre illurtratPve of different approaches to 

obtaining job Infonaation. We believe you till find the papers intereotlng 

end inform.stivo; and we hope thay vili stimulate job anslyrir effort In 

your mm organIs~t1one. 

148 

-- --. . - c 

, 

. 

.- 

.

I . 

. 

. _ ,-,:. ._ .._ ., -.. --.. “., __ _. * * _ 

.,. . ., 

New Perrpactives in Job Analysis 

JOSEPH E. WRSC 

6570th Personnel Research Laborctory, US Air Force 

. . 

As the result of an intensive research program during the past ftve or 

six years, the United States Air Force has dcveioped and applied a novel 

procedure for collecting, organizing, analyzing, and reporting comprehenafve 

job lnformatfon. The procedure cmbines features of the check list method 

with those of the open-ended questFonnaire and the observation interview 

into a single integrated procedure. I am certain that research findings and 

products obtained thus far have implication s for proficiency test development 

beyond rhoae that have been utilized. I piopose, therefore, to discuss the 

meth:? in some detail and to present some typical end products in the anticfpatfon 

that the potentialities of the method will be provocative of fdeae 

and will elicit from members of this symposium suggestions for future research 

and computer progrerusfng. 

Advantages of the Air Force Method 

The Air Por-.e method of job analysis has a number of advantages over 

traditional methods. The procedure LB simpie, economtcal, and flexible. 

It makes feasible the survey of large samples. It fs based on joint responsibility 

of job incumbents, test .:ontrol officers and unit coaxnanders. The 

job information fs obtained in etandardfzed , qusntffied or readfly quantifiable 

form. The information is current and has been found to be highly 

reliable. 

Job Anal*.rsis Operatfons 

The Air Force job analysis procedure involves a sequence of several discreet 

steps. 

a. Location and procurement of source materials. 

b. Construction of first draft of job inventory. 

c. Interview review of first draft by technical advisers. 

d. Revision of first draft of job inventory. 

e. Pield reviev of revised draft by senior incumbents. 

f. Construction of operational job inventory. 

g. Selection and location of survey sample. 

, 

149 

.

h. 

i. 

1. 

k. 

1. 

m. 

n. 

. 

Reproduction and mailing to eelected ‘X0’s 

Administration of the job inventory 

Responding to the job inventory 

Receiving, scanning, coding, and collating 

Key punching and verifying job inventory data 

Electronic computer analysis of frurvey data 

Distribution of survey remits 

Now let us look briefly at each of these steps. 

Source Materials 

The source material8 used in the construction of job inventories coneist 

of the specialty descriptions in Air Force Hanuals 36-l and 39-1, Job 

Training Standards, On-the-Job Training Package Programs, Training Course 

Outlines, Technical Orders, and any other pertinent publications. A reference 

library facility is being built up which provLdes current source 

materials pertaining to all airman career fields. 

Construction of First Draft 

An Air Force job inventory covers tasks performed by all skill levels 

of one airman career ladder from spprentice, through journeyman and supervi8or, 

to superintendent. Three persons work together In constructing the 

first draft of the inventory. A personnel technician or job annlyst select8 

duty and task statements from published source nateriale. Upon his judgment 

the quality l>f the inventory largely depends. A clerk-typist prepares successive 

drofte of the inventory and may derive preliminary task statements 

from selected sections of publications. A supervisor editor checks forn;nt, 

wording and organization of tasks statements into duty categories and coordinates 

the development of related inventories. Construction time for a 

job inventory varies with the complexity of the career ladder. For the less 

technical ladders, three to four weeks is adequate for writing the first draft. 

For the nore technically complex career ladders the period may be twice as 

lows -- six to eight weeks -- OP even lor.ger. 

Interview Review 

Fran three to eix technical advisers who are in the appropriate career 

ladder and are usually experienced senior NCO’a, are interviewed individually 

or as a group to obtain their constructive critfcism of the first draft of 

the inventory. These consultants are frequently the same subject matter 

specialiete who arc assigned on TDY to the Personnel Research Laboratory to 

build Specialty .Lowledgc Tests. 

150 

I 

_ 

c 

. 

.

I 

, 

. . 

. 

. 

(,. ..> I. “._ .__^ - ,, ..*. ., ./ 

Bevisicn of First Draft 

- - 

. . 

On the basis of the suggestions and recorenendatl.ons of the technical 

adticers the first drraft of the inventory is revised. Teaks which urre 

not lisC:d are added, dutfee oad tetzkn not performed nre deleted, and f3properly 

worded statements are corrected. 

Yield Revlev 

- - - 

. 

UcualSy frc-m 15 to 20 copies of the reviEed draft of the job invntoq 

are reproduced In booklet form. Theee booklets are then malled to Test 

Control Officers (TCOe) at rrelectcd baecs in different gf?ogPnphiCal arena 

la major air commands. The Test Control Officer8 distribute the inventorlea 

to senior job incumbents for review. The senior NW’s, like the technical 

advioere, are instructed to add duties aad tasks which are not listed, Co 

delete tasko not performed in the career ladder, to revise improperly worded 

statements, to meke reccoreeadatioas for improving the invcnioq and to return 

their suggesttoaa and comments to the Personnel Reeearch Laboratory. 

Construction of @peratloaal Job Lnventory 

- - 

Tasks statements added by senior job incumbents in the field cre extracted 

verbatim from the invector; booklets, clnsstffad by type, and grouped 

by duty category. After careful consideration and close inspection for overlapping 

atatemente, decision 5.6 made regarding the ecccptance or rejectf.oa 

of each ndded statement or susested modification. Accepted tcsk statements 

are coll.ated with statements Fn the inveatorj under their reepective duty 

headings. This second revision coastitutee the first operational form of 

the job iaventory. 

Sesnple Selection and Location 

_I_- 

Sample size depends upoa the aumber of incumbents available in the 

career lndder being surveyed. Since 2,000 ie the limit of the computer 

program capacity, thfe sets the maximum size of the sample. An attempt fe 

made to obtain anproximetely 500 incumbents fn each of the four skill levels 

in the career ladder. In order to insure having a stctistically adequate 

sRnple of each skill level, surveys us~.~elly have not been conducted in any 

career ladder where fever thaa 500 airmen are aseigned. 

Reproduction and Mailing 

TCO addresses end numbers aad locatfone of incumbents in the appropriate 

specialties are detenined from manning information suppliee by headquarters 

of the several ccmmande. The aumber of inventory booklet0 to be published 

may vary from about 600 to about 2,500 to allow for booklets not completed 

for one reason or another. Sufficient copies of the inventory for the portion 

. 

k 

; 

Ii 

:, 

)-’ 

; 

of the sample under hi.8 jurivdicttoo are mailed to each participating TM. 

Included in the package are administrative directions and other fnotrucrfone 

for hendling and returning the booklets. 

151’ 

/ 

I 

I 

I

. 

. 

Administration of the Job Inventory 

, 

� 

Teat Control Officera conduct the group ndministrntioq of the job 

inventory in base testing rooms. They scan completed booklets for adherence 

to directions and return them to Khe Personnel Research Laboratory. 

A tyRica1 job inventory of some 300 task statemcnta require8 about 

two houra adminfatration time. 

Reeponding to the Job Inventory 

Job incumbents in the selected sample complete the inventory by first 

supplying certain identification and biographical information. They then 

check al! the tasks in the inventory which they perform and write in any 

tasks they do which are not listed. Each incumbent’s statements written in 

by incumbents during the survey are transcribed, classified by type, and 

grouped by duty category. The job inventory in then revised by adding the 

acceptable write-in statements. This final revision of the inventory ia 

prepared so that a current instrument will be ready whenever a reaurvey is 

required. 

Key Yunching and Verifying 

Upon completion of a survey, incumbents’ responses entered in the 

inventory booklets are key punched into electronic data processing cards 

and verified. For each incumbent in the sample there is required a “background 

information” card, a “position title” card ond several task response 

carda, the Ku-rber depending upon the number of tasks in the inventory. One 

such taek reaponsa card is required for each 69 tasks In the inventory. 

Computer Analysis 

And nw we come to the phase of the Air Force job analyefcl procedure 

which justifies the “new perepectivea” of my title. It is In the processing 

of occupational data by means of the high speed electronic computer that tho 

most recent major advance has been made. Computer progr8ma have been written 

for the publication of a job description of the work performed by any specified 

group of individuals. These groups may be identified in terms of current 

skill level, grade, command, time on the job, geographical location, kind of 

base, typa of previouo training, or any other variable deeircd. 

Routinely, the statistical analysis of the occupatfonal data includes 

for each ekill level, apprentice, journeyman, supervisor, and superintendent 

cmputatLon of the percent performing each task, Aloo computed are the 

average percent time #pent by members of each group who perform the task, 

� 

152 

_ . . __- _-__ ..-. -- .._...._ -. ; _-.. 

I 

.

___ _- .__. _.._-._-. - ._.........I._.___._ _ _ 

. 

. 

_ 

. . 

. . 

I 

. ,. . . 

the average percent time spent by all mcrmbers of, the group, both performers 

and nonperformers of the task. The cumulative sum of the average percent 

tfne spent by all mmbera of the group is also shown so that for any group, 

tasks that consume 50 percent, 75 percent, or any other percentage of total 

tima can readily be identified. Tasks arc printed out in descending order 

of time spent on them. 

. Identiffcatlon of Job Types 

. 

. 

, 

Perhaps the most important statistfcal breakthrough, however, is the 

application, by Bottenberg and ChcLstal,of a hierarchical grouping computer 

progrsm,developed by Ward, to occupational data. This progrsm, which represents 

a major advancement in the state-of-the-art, groups together incumbents 

who perform essentially the same work activities regardless of 

skill level, grade, experfence, or assignment. In any career ladder there 

are many jobs which for all practical purposes are identfcal. The individuals 

who do these identical jobs are oaid to belong to the sane job type. 

In the grouping program,the ccmputer locates frm among perhsps 2,000 incumbents 

who have completed a job inventory for a particular career ladder, 

the two individuals vho have the most similar jobs. The computer does this 

by comparing every possible pair of fncumbents,ln the sample. A single job 

description for this pair is developed with accounts for their work tfmc 

with the least error. The ccxsputer then tests all possibilities of combining 

a job description of a third irdividual with the first accepted pair, 

or forms a new pair. This process is continued until finally the computer 

forss a group consisting of o,l members of the sample and reports the error. 

The iterative process may be terminated at any stage in the grouping 

program. The stopping point la a matter of jgldgfng when the error term 

resulting from merging somewhat dissinflar groups becomes unacceptably large, 

In a study involving 836 cases in the Personnel Career Ladder in vhich 35 

job types were identifled,the grouping process #as stopped at the 118 group 

stage. At this stage there were 27 groups containing five or more members. 

One of these groups which was composed of ttu, groups identified earlier in 

the program was listed as two separate job types and seven other job types 

were generated at later stages in the grouping process. 

In some job types work is concentrated upon a few tasks while the work 

of other job types is quite diverse. In general, it is found that the number 

of tasks performed is directly rclsted to skill level--the higher the level 

the more tasks are done. Certain supervisory job types can readily be differentiated 

fras technical job types, both in terms of specific tasks performed 

and in terms of skill lc*Jels of members forming the groups, Other 

job types cut across skill levels. 

Purposes of Job Analysis 

The Af.r Force method of job analysis has been designed, not for a 

speciffc purpose, but rather as R general procedure the results of which 

153 

. 

-, .I ! ’ 

,]: ,/

. 

‘,‘, 

. 

: 

. . ; 

. 

.,--- 

-. 

can be adapted to many uses. The Personnel Research Laboratory ROV has, or 

can develop, programs for producing repcrts to service the needs of many 

agencies. In connection with training, these program can be used to validate 

training standards, design training courses, deteminc which tasks should be 

taught in school and !&ich should be learned on the job, indicate which tnska 

should be taught early and which should be postponed, and so on. Proyrmas 

can be developed to validate qualitative personnel rcqiirementa information, 

to aid in the establishment of specialty qvalificntion requirements, and to 

identify the need for new specialLies and ahredouts. Job analysis results 

may be used to guide the developmen t of selection and classification testa, 

to improve assigcxnent procedures, to determine standards of job performance, 

to provide basis for job evaluation, and to contribute to manpower and organirational 

analysis. 

Job Analysis for Test Develoment 

In addition to the purposes outlined, one of the major functions of 

the Air Porte method of job analysis is that of providing data for teat 

development. Data derived frw computer programs may be used to maximize 

the content validity of Specialty Knowledge Teats, and to eatablish better 

measures of on-the-job proficiency. Results of many surveys now available 

show the percent of members of each skill level performing each task and 

tasks performed by the various skfll levels arranged in descending order of 

time spent on them. AE the grouping program becmes operational, tasks 

performed by various job types in each career ladder will be idectified. 

Hew Perspectives 

At the present time,vigorous research efforts are being directed toward 

improvement in flexibility and capability of the current computer grouping 

programs, Other research is devoted to the identffication of significant 

task rating factors and to the development of methods for obtaining other 

ancilliary job information from incumbents or their supervisors. Many 

possible factors are being considered. In studies now under way, the following 

task or job rating factors are being investigated: 

a. Frequency of task performance. 

b. Importance of task ccmpared with other tasks done* 

c. Technical assistance required. 

d. Difficulty of learning to do taek. 

e. On-the-job training required to perform task. 

. 

_.-.. 

_ 

.--‘, --- - 

_-_ _ _- - -.-. 

~

‘. 

. . ‘. 

. 

. 

. 

.i, 

. 

-, 

‘.. 

‘,. 

‘. . 

- - 

-. 

i I. ,, 

i ,* 

.\ 1. 

. 

-_' 

1( 

. 

. 

j , _ 

. 

_. * 1 I., ..-. _. 

f. Difficulty of learning to do task by OJT. 

g. Training emphasis task should have. 

h. Time apent in ~pccinl training for fob. 

i. Extent to t.hich job give& satisfac:ion. 

In Borne surveys the task rating factor used in addition to time spent 

is specifically tailored to fit s particular career ladder. Similarly, 

other information sought has specific reference to certain special coI>rscs 

or kinds of equipment used. In a recent survey of the Administrative Cnreer 

Ladder, for example, incumbents were asked to indicate admlnlstrativc ccurBe8 

they had attended. They were also required to gfve the number of hours per 

week usually spent in typing, and whether a menuai, electric, or both kinds 

of typewrlirr were used. Since the Training C-and was interested fn indications 

of the words-per-minute rates to which students should be‘ trained, 

incumbents were asked to check their typing speed on a six-point scale. The 

survey data have not let been analyzed but with an fntidental sample of 105 

inventories the following results were shown: 

AveraRe typing speed 

Under 15 W!?X 3 

15 - 24 wF?4 5 

25 - 34 WE4 9 

35 - 44 WFM 30 

.45 ” 54 WFM 36 

55 WPM or Over 17 

Total 100 

If the final survey, when validated, corroborates this am;le, it appeara 

that the results have obvious implications for training. 

Great advances have been made from the traditional observation, check 

list, questionnaire, and interview methods of job analyrris. The feastbility 

of the Air Force method has been azmply demonstrated in an operational setting. 

Some of the potentialities of the method have been examined but for 

the most part, the field of occupational analysis is comparatively unexploted 

in the light of modern scientific techniques. However, new peropcctlves have 

been revealed and at leant Borne of the problems have been identified and 

defined. 

155

. 

CL%TiS D. l!CYRIDZ 

US Army Artillery and Hissile School 

Tha US Army i* one of tha large&t producers of achievmant type tQRtR 

within tha United States today. During fiscal year 1964 over 1,000 different 

nchlsvemmt-type teats W~J produced by the US Amy. Them schievamrrctype 

tests mu referred to as KEi Evnluotion Teat8 within ths US Amy. 

The US Army fa organized under a decanttslfrcd concept for tha production 

of the reqc?rsd touting r=rtcrial becauee it r;ould rtquFre a very lerge 

and expensive teat construction rtcff end edmfnintrotivo crtatlirhment if 

all the teat materiel was produced at a ceotrclieed point. 

The US Amy Enlisted Zvrluation Center, Fort Benjamin liorriaon, Indiana, 

6CtO QO ths coordinating agency foi the ?lOS avaluatton testing program 

between the varicue rchoolr with the US Amy tie fumioh the r&w test 

mtarial and the Office of Perumnel Operfltiona, Department of the Army. 

When 8 dacantr~lfred orgtifaation of thie type in used in the productfon 

of test mter’ol, aavctsl problem ere created for US A:=y terting 

peraonnsl. One of them pioblm mea&!, MO3 Evclulttfon Teat outlimB, i8 

the topi.c for our discueolon. 

A teat plcn or by US Amy tsminology an MOS Evaluatfon Tcet outline 

for a opacific KOS Evaluation lert ia originally conrtructed by test .spacialietr 

at the US Army Enliatcd Evaluation Center- Fort Ben jm,fn Harrison, 

Indiana. The KCS Evelu*tion Test outline ir forwardad to the pertinent 

US Amy Schools for their review a.nd revisions. The US Army SChOO16 normally 

rely on porronnsl within the School dapartmanto fcr the teat outline 

review and revlofonr . After the test outlinas have been reviemd and rcvised 

by the US Amy Schoolo, they are returned to the US Army Enlisted 

Evaluation Center where a final revision ie performed, and the teat outiine 

ia then used by the US Amy Schoolr US a guide when constructing test Items. 

When this rycsten it~ ured in reviewing and revising l4CS Evaluation Teat 

outlined, it should be noted that ths review and ravleion have been c~ccmplished 

only by teat opecialist sod instructor personnel within the Army 

Schoolr. For the most part, no anllrted peraonnsl acturlly ?-*orking in the 

job hova been involved. 

Some of the foulto found in urfng a procedure of thio type for 

constructing KOS Evaluatf-n Tert outlines nrt a8 follovs: 

a. Publications uaed by test specialists as a baaia in deciding 

the aubjcct-matter � :eao to be included fn a teat outline arc in many 

carea not current. For exempla, one of the major publications uatd by 

toet opeciallst within the US Aczty ‘*en constructing a teat outline ie A job 

156

description pbliehed in AR 611-201. This job description lists all the 

skills and knowledge pertaining to a portlcular job (Military Occupational 

Speci al ty) , In many instances these job descriptions have not been upJnted 

for several years. There are many other publications relied upon by the 

teat specialist chat are net updated. 

b. The test specialist and reviewing and rcvistng personnel uho are 

constructing a test outline for a partfcular Hflitarj Cccupnticnai Specialty 

have had very lfnlted experience with the job in many cases; therefore, the 

test outlines might include areas to be tested that are not signiricant or 

it might exclude areas that definitely should be sampled. Lack of experience 

with the job also results in poo r weighting of the test outline. 

c. Some of the terminology used by the constructors of test outlines, 

who have had limited experience in the jobs for which the test outlfnea are 

being built, is not stated in terms that are understandable to the item 

writers and examinees. 

The US Army Artillery and MFasile School recognized some of the inherent 

weaknesses of reviewing a test outline through a decentralized organization 

and began to plan some means of improving test OuKlfne review 

prccedures. There was some thou;\rt of sending questionnaires to enlisted 

personnel and asking for suggestions on changes to the test outline. Another 

idea entertained was that of makin g visits to some Army nrganlrntions, 

talking with enlisted personnel,and asking for suggested improvements to 

the test outlines. These ideas seemed unfeasible for various r.,asons. It 

was finally ihcided that the US Amy Artillery and Missile School would use 

what it calls anHOS Evaluation Test Outline Seminar in an attempt to improve 

test outline review procedures. 

The underlying idea of the HOS Evaluation Test Outline Seminar is to 

gamer the thoughts of test specialists, experienced School instructors, 

and ffeld experienced anliated personnel at various skill levels and use 

theee thoughts when constructing, reviewing, or revising test outlines, thus 

creating a test outline balanced on academic school thought, test specialist 

thought, and job expertence thought. 

These seminars are still !n the experimental stage, To date, two !-lOS 

Evaluation Test Outline Seminars have been held. The success of the last 

two seminars has indicated that the US Army Artillery and Hissilc School 

will use seminars in the future as a regular part of its test outline review 

procedures. The last HOS Evaluation Test Outline Seminar was held on 

16 September 1964 at the US Army Artillery and Missile School for the pur- 

Tose of reviewing and revising the MOS Evaluation Test outline for HOS 142 

(Heavy and Very Heavy Field Artfllery Crewman). Those taking part in the 

seminar included 20 enlisted personnel, a test specialist from the Enlisted 

Evaluation Center, a senior instructor from the Gunnery Department, and test 

specialists from the US Army Artillery and Missile School. 

157

When a request was made for the names of enlisted peroonnel who would 

attend, the request specified that the individuals should be the “best” 

qualified in the 142 MOS and have considerable field experience. The request 

also indicated that 011 skill levels within the KG should attend. 

The active military service represented by the enlisted men who participated 

in the seminar rnnged from 10 to 29 years -- a combined total of 296.5 years, 

of which 116.5 years are associated with MOS 142. The active military 

service per individual averaged 14.8 years. The senior school instructor 

who attended had a total of 22 years’ experience in a field cloaaly related 

to the MOS. The test specialist attending had considerable experience fn 

the field of test construction. 

The individuals attending the seminar were thoroughly briefed on the 

role they were to play during the seminar and the procedures to be followed. 

Personnel were divided into two working groups with all skill levels represented 

in each group. The test specialists acted as monitors for the 

working groups. 

Each member of the working groups was given a copy of the 142 XOS 

Evaluation Test outline as proposed by the Enlisted Evaluation Center minus 

the weights. The morning session consisted of a review of the test outline 

subject-matter area descriptions with all members of the working groups 

discc.seing revisions, changes, or additions to the area descriptions, 

This allowed the view points of the enlisted personnel, the test specialist, 

and the senior school department instructor to be presented on an informal 

basis with the consensus determining what should or should not be included 

in the test outline subject-matter nreas. The afternoon session was devoted 

to the weighting of the test outlil,2 by the working groupe using the aamt) 

general procedures aa already discussed. A critique of the day’s work 

closed the MOS Evaluation Teat Outlina Seminar. 

After the seminar, the test specialists from the US Army Artillery and 

Missile School and the Enlisted Evaluation Center reviewed the results of 

the suggested deletions, revisions, and changes indicated by the two working 

groups during the Seminar and constructed a finalized test outline that 

was based on the combined thoughts of the three major groups involved. The 

finalized test outline was sent to the pertinent School departments at the 

US Army Artfllery and Missile School for their review and comments and then 

for-warded to the US Army Enlisted Evaluation Center for final review, 

An MCS Evaluatfon Teat outline that has been processed through an HOS 

Evaluation Test Outline Seminar results in a test outline that: 

a. Contains subject-matter areas significant to the job for which 

the examinee is being tested, 

158 

_-.-----__

. . 

b. Contains terminology that should be clear to anyone within the HOS 

because the terminology has been vrLtten in terms which has net the satisfaction 

of the individuals taktng part in the seminar. 

c. Contains properly weighted subject-matter areas because it represents 

the combined thoughts of the various types of groups &to were involved 

in the seminar. 

The main otrength of the HOS Evaluation Test Outline Seminar lies in 

its ability to create a test outline that contains a balance of thought 

among on-the-job experienced eniisted personnel, experienced school fnstructor 

personnel, and test specialists. 

The US Army Artillery and Hissile School feels that the HOS Evaluation 

Test Outline Seminar will develcp into a valuable tool to be used when 

constructing, revising, or relieving test outlines fn the future and it 

will provide a vsluable input on which to make recommendations to revise 

the NOS job descriptions and other related training publications. 

159

Item Writing Procedures for IncrasLng Validity 

I. J. WEwEilw, Chairman 

6570th Personnel Reeearch Lboratory, US Air Force 

The chairman introducad the subject by otatfng that the starting 

point for increasing the validity of item8 i6 in the fob analysis and 

* the outline. The firrt thing the vrftar must know i8 what he ir eup- 

’ I pored to be measuring. If he 10 measuring knowledge, one type of item 

.’ 18 required, and if he is mearurtng job akilla, another typa Is called 

for. 

*, 

The particlpantr brought out the vaafour problema they have in thla 

area dut to tha vsrfatfonr found in each of the oervlcer approach to 

thir problan. When the dircu@eion developed the great interest the 

earvices have in a proper job dercripcfon oud job analyaie to help rolvo 

thir problem, the choirman reminded the particlpnntc of the work being 

done in thfa area by the USAF. He tcuched on thio very briefly, 

reminding the group that a fuller triutnent was on the program et a 

later reseion. 

There aaaned to be agreement that the job analyoie and outYinr 

were the foundation of a eood teut item, the problem vipe in getting 

this enalysir . 

. 

160 

, . 

. 

.

\.’ 

( , 

r 

i 

I 

I 

I 

I 

. . 

Non-Fbpfrical Validation of Teat Itms 

The folloving paper conctituted n hando\:t. The groop dfncueoed 

the pofnte mder la, b, c, and d; nnd 2a. 0, and c on page 4 end 5 of 

the peper. No decisive conclusions were darivcd by the group dU@ 

to 

time ?Lmitatione. 

The diocussion of the problem preorntsd in this paper shou Id be 

fiirther clarbflad, specific e?Xr%QieS furnf.shad, and then ehould ?Je 

prarented to the PfYX en u work RmLnar tit the next meting, 

161 

.; 

? !

- 

Preoentation to MTA Panel ConaLdering 

Non-EmpF;ical Velidatfan of Teet Items 

The fact that validity ia many-faced and difficult to determine 

does not lessen the importance of the problem. The test produced by 

all services here are of tremendous faport. (The results of theoe tears 

determine which enlisted personne? get additional pay for isuparior performence, 

and help determine which get promoted to the newt higher pay 

grade.) Thus, tha ccoaomic and leadership status and quality of the 

enlisted personnel of our Armed Forcer are effected etrongly by 0;1r te8tm. 

As e result, the morals and quality of our fighting forces ere directly 

Affected by the military testing program--our teeto. 

It ie our duty to aek ourselves, “How valid are our tertr? Do our 

teat identify the best informed and mst nkilled personnel in the euma 

compet~t~va occupa~el area? Do thcytruly FdentFfy the least lnfowed 

end lee8 rkilled?” Becauee these evaiuatfon testa do affect tha letdership 

and morale of the enlleted structure of our nation’s ff.ghttng forcec, 

we are interested fn determtnfng and fmprovfng the validity of our itema, 

and thus of our testa. 

The California Te8t Bureau has cotegorited the two bAsliC spproachrs 

to the detcrmlnAtion of validity according to the chr.;t on page 201. 

Thorndike end Hsgen (1961) list tha specific considerationa entering 

into evaluationa of test8 as (1) validity, (2) rellabFLlty, and (3) practfciality. 

Validity refers to the extent to which t test meaeuree what it 

18 intended to meascru. Reliability has to do with the accuracy and precision 

of a measurement procedure --conoirtency and reproducibility. Practicelity 

in concerned with Marty factora, ruch AP economy, convenience, 

and interpretabilfty--whfch determine vhether a teixt is practical for 

widespread use. 

A reading achievement tert requires people to select certain answer@ 

to quertlonr. Penci.1 marka on enrwer rheeta determlqs each peraon’r 

more. Thio Gcore ir called their reading comprehenrlve #core, but the 

Icore fs NOT the compreheneion. It io A rfecord of 8ample~behavior. 

Any judgment regarding compreheorion ir an inference from thfc number 

of allegedly correct anowers. Its vs:fdity in not self-evident but rnuet 

be astsblfshed on the basis of BdequAta evidence. 

Thorndike and Hagen (1961) #Late, “A teet may be thought of ao cxraapondlng 

to some aspect of human bahavior In three sensea.” To identify 

these categories they use the terms (1) “reprerent,” (2) “predict,” and 

(3) “sfgnify.” Of these three type6 of evidences of validity, our topic 

ir prfnci+lly concerned with the fLrst type, KO wit: How well do our 

tests reprerment the standard8 (the practical derfred level of performance) 

. 

162 

_ _ .--

-- 

. .a.,_.. 

* . 

, / 

of the occupation or job? Xn proportton a6 the level of job perfornanco 

(achievemant of job standards) are accurately represented in the test, 

the test is valid. The proceso‘of obtaining thio proportfon of job 

performance to test contant is aesentfally a rational or judgmental 

one. This 8nalyofa 10 aometLmea spoksn of as rationai or 1ogLcal validity. 

Since the analyefs fr largely in terms of the content of the test, the 

term contenr validity fs sometimes used. 

We should not think of content too narrowly, because we arc Fnterestod 

in process as much aa fn atipla content. Thuo, in the test of a 

mechanic ve are concerned vl:h euch “content” elements a6 forms for 

requioftioning materfal8, of what materials rivets or bolts should be 

constructed, rules of safety, priwiples of expanrfon and contraction 

and remote reading gages. WC &r-e also interested in such “proceoa” 

eklllr as troubleohootFng and correcting defects of equipment, the 

ability to solve work aerignment problems, the ability to organize a 

repair crew for a particular job, and the use of correct procedurea in 

solvfng many other specific job proglanr. 

Thus, I submit the therlr that the problan of appraising content 

validity of a test starts vith the detailed listfng of the standards of 

job performance, and includes rucceo~lvaly ranking the content of the 

job standards, and canparing thl..a vith the blueprint of the test. After 

thFe task Lu completed, the item mu6t be compared vfth the test outline 

and thus vith the job rtandardu. 

The extent to vhfch the test outline areas (and thus the ‘CL standards) 

arc tapreoan: td in the teat 10 a crueLa fndicator of the validity of the 

te6t. 

Rovaver , ve cannot ignore predictfve and concurrent valfdity. Before 

ve consider techniques or methods of obtatning content validity of tests, 

it ir noteworthy to conbider the empirical method of valibting te(ltl), 

which Thorndike and Wagen (1961) clasrLfy a6 predfctive validity. 

As Thorndike and Hagen (1961) state, and I quote them verbatim, I’... 

predlctlve validity can be estimated by determining the correlation betvecn 

test scores and a euitablc criterion measure of euccesu on the job. The 

joker here ia the phraoe ‘suitable criterion maasure’....One of the moat 

difficult problems that the pcruonnel psychologist or educator faces is 

that of locating or creating a satisfactory mtaaure of job aucceae to 

eervc as � criterion mearure for test valLdation....All criterion measures 

are only partial fn that they measure only a part of success on the job, 

or only prelfminaries to actual job performances . . .an ultLmate criterion 

ir fnaccerrfble to us . ..and substitutes (intermediate criteria) are only 

partial and are never completely satisfactory . ..The problems of effectiva 

ratfng of paro3nnel arc dfscuasc,! in detafl in Chapter 13. It suffices 

to indfcate here that ratings are often unstable and are influenced by 

. 

163 

_ --.---- 3 --

-- 

-- 

. 

_ . _-._ ._- --- ..--._. - ._. 

many factor8 othsr than the proficiency of perronnarl being rated,” The 

authors Ilrt mwtn genaral limitations of rating proceduras (pp. 383-384). 

Thue, SU& retlnga should salden, if aver, be unod ee the sole or determining 

evidence of validity. For example, both logical enalyalo and 

raoaarch on ratinga indicate that peara end cuparvisors tend to rnte 

eomewhet different qualities end to weight these differently even when 

given exactly the same acelco end dafinltlonr. 

Whet I em proposing la related to an urea of ccnzmunicatfono rtrcarch 

celled “content enelyele.” In his book -Content - Analyaie fn Cmunicmtiona 

Rcoesrch, Dr. Bernard Birdson defines content enelyels as?‘...6 reoearch 

technique for the objective end quantitative description of the taenifert 

content of cmraunfcetlon.” Of courle, the test ltaa le the coazuunlcetion 

whooe menlfcet content interest8 ue. 

BcfDre proceeding further, perhaps we ten avoid quibbling on 

maentlc differencer by agrtclng to UIC English end Englioh (195B) 

deflnltlone of certain tat-ma bselc to our dfrcuesion. 

Dtflnftlone 

.I. Oblectlvt of test: To objectively neeeure the degree to which 

a person has mertered eJI1 tlglentr of a epecific occupetion (the vo:d 

degree impliea separating workere on the basis of dtffertng ebilftlte.) 

2. Job Kaatery: Such pr?fCcleney ln e epecfffc occupation thet 

certain definad standards of eccomplfshmsnt can be met perfectly (Englleh 

end English, 1958). 

3. Discrfmlnetfon: The proceos of detecting dfffcrencte (our teete 

must be able to measure differences in degree of job mestary emong pcopl~ 

working ln the euae occupation). 

4. Standard: That which is expected; a practical, desirable level 

of ptrformsnce (Englleh and Engllrh, 1958). 

5. Job Standard: Army Regulation AR 611-2Gl (1961), pegs 6, atetee 

II .** d. Fzth digit . . ..Thara aklll level designations indfcate the level 

of proficiency required in e epeclflc job end the corresponding quellficetlon 

of an lndivlduel... (With the exception of HGS Code 718) NAVPERS 

18068 (1958) lieto the job � tendardo of Navy enlleted occupation groupr. 

Air Force Kenuel 35-l (1957) lleto the job standard8 for the Air Force 

enlirted occupatione. Slmtlar regulations list job etendardr for the Royal 

Cenadien Navy, the US Coert Cuerd, the Herchent Heri.ne Academy, and the 

tfarlne Corpr . 

. 

. 

164 

--

, 

‘ . 

. . 

_ -. . . 

For other deffnitlon&, English and Englierh (195Y), fn available 

for reference if neeP be during thir diecuosion. 

A teat itm la only valfd to the degrees that it represcntrc the 

job standard it la meant to rample and to thfa dqrea it contributes 

to the validity of the teat aa a whole. 

In order to guide ua In makin g useful prozrsoo Fn the ohort time 

available, I have llated major rteps which should help UQI to work orlt 

an acceptable proceao for logical validation of teat outlinen and tent 

items. 

1. For logical validation of test outlince vc muat accomp’iiah 

the following: 

a. Datamine what conrtitutto the panel of exports. 

b. Determine the criteria which the panel of experta plhould 

use in ranking element8 of job standards. 

c, Detel-mina the methode to usa in correlating the test outline 

with its aealgned weights -a8 it -axiete - with the rank order of job 

standardr arrived at by the panel of experto. 

d. What criteria, if any, ohould propored changee to teat 

outliner be required to meet before they are incorporated into cxirtfnp, 

validated teet outliner? 

2. For logical validation of teat Itemme, we uuat accomptfrh the 

following: 

a. Detennfne what conrtftuter a panel of experts. (Should 

they be the same peroonnel who validate job rtdndards?) 

b. Determine the criteria vhich the panel of experts ehould 

use in ranking teat Itema. 

c. Determine the methods to use in correlating teat ftmn 

vith the following: 

(1) Original tc6t outline 4rerc. 

(2) Rank order cf fob standards (tert outline) obtained 

fran the panel of experts. 

165 

____ _________ _ __.. --.___ ,_.___. __--- -- 

I 

I 

I .

.- 

We probably ohould not conc1~rn ourcralvea here ~Lth tha tcchnlcnl 

accuracy of itakka in 43 test, EB rhie point rel.acwi to content validity 

uauafly fe validated by chockFug tha publication uaeci Fn raforencing 

the LeQm. 

A point which ahould bn coneidarsd ia that such rtrmrch BI has 

been conducLed hm failad to ffnd my eigntffcmt diffarauce In validity 

of ftcme ragardllaeo of uhothsr they conforra precFesPy to “Ltesn construction 

prfncfplar” or not. 

1. Fraquency of perfaming tho task: Vow hportant ie thio 

factor? What is tha Polorionehip of frequency, or routina, of job took 

to ehe crdtfcalit~ of ths tark? 

2. Criticality of performing the task: What fa the rerult if 

the tank ir HOT parfowed? if the task io ?JOT perfowod correctly? 

rafaly? Will man be killed? aquipmant ruined? miasfon fafl to be accompliohad?, 

etc. 

3. KncwZedgs ssaential to perform the feek. 

4. PTOC~YI, or use, of the knowladge to perform eha raako, 

5. Hew well do the problems in thirr ~1rm discriminate between 

anlisted perrono having high, svaregc,, and low ?svaln of job curtaty? 

Z’ha ocalar attached ara rugeasted for your use in l.le~lng auccfxtIy 

tha oseential criteria axparr judgea ohould u5?1 in cetegoriting job treks 

and teat itema atz one and of ehe scalao or in the middle. Scratch ; aper 

10 ateachod aloo for your coiweniance. 

E: Savaral copieo of the content validation etudy being conducted 

by Major David Culclasurs, MS Evaluation Teat Project Director, US Arq 

Medical Field Service School, Fort Smn Houston, TBXEO, ara availatlo to 

panel mexnbare for information and uce for suggaotloar do to procedurar. 

We invite Major Culclaaure to ditcues hia study with uil at thle point. 

166 

.

. 

----- 

We probably ahould not conc~~tn ourselvGa here with tho technical 

accuracy of item8 fn a test , us this pofnt ra?ot%d to content validity 

urually ir validated by checking :ha publication used In raferenclrq 

the It=. 

A point which ohould be considered ir that much raeearch aa hee 

bsen conducted harr foilad to fIn.3 sny oignificant difference In validity 

of ittwo ragardleor of whether they conform pracfaaly to “ita construction 

prfnciplae” or not, 

1. Frequency of parformIng the task: Ho3 inpsrtmt is thio 

factor? Whet Ia the relationehip of frequency, OT rcuclne, of job ttask 

to the criticality of the taok? 

2. Criticality of perfoming the task: What Is the rasolt if 

the Lark Ir NOT performed? If the task IO NOT parfamtd corrsctlyt 

� afaly? Wfll men be kIllad? squipmant ruined? miosion fail to ba accomplishcd?, 

etc. 

3. Knowladga eratntlal to parfom tha tack. 

4. Proceea, or UIQ, of the knowledga to perfom the taoks. 

5. How wall do the problems In this ares dIocrLmZnate between 

-lIlted pereons having high, average, and low levalrr of fob rnnetery? 

The rcnlar attached ara suggested for your uoa In listing ruccfnctly 

the essential criteria oxpert jud305 should use in categorialng job tmrko 

and toet Items at onr end of tha scales or In tha niddle. Scratch pager 

I8 attached alro for your convenience, 

NOTE : Several copier of the content valIdrt?an otudy being conducted 

by MaFDovtd Culclot~rti, MOS Ev&luatIon Test Project Director, US Amy 

Medical PIeld Service School, Fort Smn Rouoton, TEXAS, arc avaflatle to 

penal Iombarr for Infomatton and use for suggertions aa to procadureo. 

We fnvfte Hajor Culclavure to diecuos hia etudy with ua at thic point. 

166 

, 

c

. . 

. 

LOXCAL EKPIRICAL 

Content Valldfty 

- - 

Refers to how vcll the content of 

the tent samples the subject matter or 

� ituatioo about which conclusions are to 

ba dravn. Content Valid& is especially 

important Ln an schicvenent teat. Rxamplea 

: testbook analysis, description 

of the universe of itena. rdoquacy of 

the eI[Icple, represanta~:ivencss of the 

teat content, intarcorr?lationa of 

subocorcs, opinions of jury of experts. 

1tcm structure 

Includes (1) Corroborative evidence 

from item analysis rupporting the other 

characteristics of the teat; i.e.. interrclntionahips 

between items, bctvecn items 

and scores, and between items and criteria; 

(2) Item ccmposftion. For graphic items, 

it emphasizes perceptual clarity and rclated 

format functions. For verbal itms, 

i< mphasircs conceptual clarity in the 

cqra8oion of itema. For both graphic and 

verbal items), it Emphasizes functions of 

dietractors. 

I 

Construct Validtty 

I 

Concarns the psychological qt.-lities e teat 

mcaaurco. By both logical md empirical mcthodr 

the theory und’erlying the test is validated. 

Exmpleo: Corrciatims o f the teat scorea, factor 

malysir, uoc of inventories. studying rhc effect 

of speed on test scores. 

I 

A 

< 

. $, v 

-Cr-..-..~..--._-,--- 

Predictive Validity 

Relates to hw veil predictions u.ade 

rfrom the test are confitued by data collactcd 

at a later time. -loo‘? Correlations 

of intelligence test8coras with 

cwrfe gradeo, corrclatioo of teat acores 

obtained at beginning of year with msrka 

earned at the md of the yaLr. 

Concurrent Vafidlty 

Refcra to hov veil teat mores natch 

mcasuree of contcmporsry criterion porfomance. 

Examplea: Cowparing of scores 

for men in an occupation vith these for 

men-in-general, correlation of pcraonaPity 

test scores with eotiwtcs of adjustments 

made in the mmnrcling interviews, correlation 

of end-of-course achieveac:rt or 

ability test scores with school okra. 

. 

\ 

___~__^ 

1 

I 

,,* - _,. . - - * -. 

. 

.

I 

Y--L. 

. 

.- 

-. * 

-- ,< 

. 

_. 

: 

. 

- 

: ,- 

\..L 

. - 

-=Yp 

, 

. . 

._ -* 

- .- 

L 

.C 

-._ 

.

, 

__-____ 14_1- -...-. -- _.._.~ 

_.. 

..___ -..~ .- _.._ -___- 

___C__-..-- 

169

, 

c 

References 

Birdaon, B. Content analyeir in coimunication~ reaearch. 

- - - 

Illfnoff3: Free Pfesn, 195X 

Glencoe, 

Californfa Teat Bureau. p- gloeeary of meaeumment tax-ma. Monterey, 

California. 

Chief, Bureau of Nuvol Peroonnel, US Navy, NAVPERS 18061. Washington, 

D. C., 1958. 

English, H. B., b English, A. C. Dictionary of psychological and 

psychoanalytical terms. New York: LongGiG & Company, 193K 

Headquarters, Department of the Air Force. Air Force Manual 35-l.. 1957. 

Headquarters, Department of the At-my. AR 611-201, Washington, D. C., 1961. 

horndike, K. L. & Hagen, Elfrabtth. Meaaurcnant and evaluation in 

psychology and education. 

Sons, Inc. ,196l. 

(2d ed.). New YorrJohn Wiley &- 

, 

170 

.

, 

.- - 

L 

L 

. 

-. 

.--Ly 

-_ 

_. - 

. . 

\. 

\ 

. . 

. j 

‘r 

:\ 

\ I ; 

. 

. . 

Teet and Item Revision Technique8 

J. E . PARTTHCXOH , Chairmsn 

Teat And item rovisdom Are neceenuxy for A vcnriety of ressone. 

‘Zrc main reason for euch revisione ie to bring About improvement in 

measuring inotrumants. A test muat be revised if the job requiramcnte 

are changed And if it la not functioning properly AA A mcssurfng inatrumcnt, 

Among the pany raaAonA for reviaing itema Arc: obsoleaonca, too 

easy, too difficult, do not discriminate bctveen those who have HIsatered 

the job and those who have not, lack of validity, not job oriented, 

require rote memory to ancrwar. 

Techniques for Item Revision 

Statistical data, particularly the item anelyaio capl Aerve ca a tool 

or guide for indicating area8 vithin iteazre which Ara in need of revieion. 

It ie neceesAry for the teat psychologiet And the subject-matter expert 

to work together in analytfng And literally “taking apArt” the test item 

which According to tha statieticel Analyale hea not functioned in the 

way that it rhould. Evan though tha subject-matter expert May knc~a hio 

Aubjact,ha rosy not be Able to analyze the poorly-functioning item And 

bring about changes in it. it iA usually helpful for the teAt peychologist 

to ark the eubject-mattsr expert to deacriba in detail the protees 

or actions through vhfch tha eraminae must go to Bnevor tho particular 

Item. This should bring into clear focus for tha rubjact-mattcr 

axpart tha Apacific raquiratants on the part of tha exeminaa vhan he 

is facad with A problem which the Item presenta. When thio Approach 

iA tAkan to item revieion, it is oeu~lly a~sy for the eubjectwttar 

axpart to caA ths reAAon for tha poor functioning of the item. 

The following Actione or consideration of the following fsctorr 

rhould ba helpful in the revieion of test itatnr. 

A. ItemA rhould ba 80 written AA to make thafr dirtroctorr AA 

AttrACtiVa AO poasibla. . . 

b. Look for undatccted Ambiguity in fttxn diotractoro which 

mirlsad sxcrPrFnaea. Itaa Atems May also be ~mbiguour. 

c. Eliminate nonfunctioning dietractors. 

d. UIQ hints provided by the dAtA concorning mentA1 procarrso 

of axzunlnaes. 

I ---.---_.----___ ._ . . --. ---__ 

171

‘7-. 

* -. 

, 

. 

-- 

. 

_ ._ 

e. If a diatractor ia discriminating in the wrong direction it 

may rapreoent an aspect th8.t cannot be r8movod without d88troying the 

whole point of the item. The iccm nny be covering the point about which 

there fo much mietnformntion; the dintractor should not be revlaed if 

the point of the item ie loat through ravieion. 

The following quotation from Devie’ ch8ptar on “Item Selection 

Techniques” in Lindquist’e Educational Measurement should be helpful 

when revieion technique8 8re applied to testa end item8. .- 

“It is interesting that mny invalid dietracters are found in items 

that how been carefully edited and checked by subject-matter urperts. 

Thi8 anph88ixes the well-kno-m fact thnt because a dfatracter ie dircriminative 

in the wrong direction we cannot conclude that it io too 

nearly correct from a factual point of view. Convereely, the fact that 

an incorrect choice ie too nearly a correct anawer doe8 not nacasaarily 

mean that it will turn out to be diocriminative in the wrong diraction 

when it ir subjected to item mnlyoie. Item analyois technique8 cunnot 

alone be relied upon to detect error8 and 8mbiguitieo; expert crfticiw 

and editing ore indiepenaable in te8t construction, The full value of 

item analyrie technique8 cannot be reelircd unlers criticiomo of the 

item8 by recognited authoritier are available for reference.” 

Referance 

Davio, P. B., Item selection technique8 in Lfndquiot, E. F. (ed.), 

Educational Measurement. Uarhlngton, D. C.: American Council 

on Educcltionx951, 308. 

. 

. 

\ 

, 

172

._“...- 

. 

, 

._ I . . ..,_ . _, _ . . .- 

. 

__ .I ^., bL 

PAPEA 

Answers to Common Criticisms of Tests 

FRANK H. PR-ICE, JR. 


It is appropriate at s conference of this type that we constder sane 

of the common criticisms of tests. This consideration is especially fitting 

since the mass testing movement was spawned as a result of the rush 

to military prcperdncss for World War I. 

The criticisms of testing come from many sources both within and 

without professional psychology. Generally, the psychologist critics are 

constructive while the lay critics are destructive. This morning we are 

concerned with these lay critics who have protested in the popular press 

and books such as Hoffmann’s The Tryanny of Testing, Harrington’s Life in 

the Crystnl Palace, Gross’s The Brain Watches, and Whyte’s Crgnnfzation 

&, indicating public distrust, uneasiness, and ignorance about which 

we mus’t be concerned, 

Most of our critics have leveled their blncts at so c6lled educational 

and industrial tests. Very seldom have our military testing programs been 

the direct victims cr’ such scath:ng attacks; but, just because we have not 

been the subjects a; eloquently set forth pronouncements does not mean 

that we are not discussed and “cussed” in the dayrooms and barracks of 

those we test and even in the offices and headquarters of those for whom 

we test. 

My remarks this morning will not be limited to testing in the military 

setting. Almost all of the recent protests against psychological 

testing are as applicable, if not more so, to the testing of military 

personnel as to the testing of school children, college students, or 

Industrial applicants and employees. And I hope that you will be able 

to apply this discussion of some of the criticisms in terms of the 

particular problems and Interests of your military programs. 

In this vein, I will devote particular attention to The Tyranny 

Testing (1962) primarily because Hoffmann appears to be more sophisticated 

atld-devious, but slightly less venemous than other of our critfcs. To 

give credit where credit is due, a symposium on this subject presented 

by Owens, Astin, Dunnette, and Albright at the 1963 meetings 4 t’,e Midwestern 

Psychological Association has provided much of the source material 

for this paper. 

First, let us examine some of the major assumptions found in 

Hoffmann’s detailed indictment of the multiple-choice test. He (p. 150) 

states, and here he is talking about multiple-choice tests -- the same 

type we construct and admfnfster -- “The tests deny the creative person 

a significant opportunity to.demonstrate his creativity and favor the 

shrewd and facile candidate over the one who has sometning to say.” The 

173 

, I 

_ . _ . _

.- 

/ .‘4 ‘._. 

. 

- -- 

. 

problem with such an ammption is that there lo virtually no sotieftctory 

research defining creativeness. Usually the people who sccre high. on so 

called “creativityI’ tests are aimply the ones we call creative. In fact, 

there is considerable evidence to indicate that creativity teal& arc not 

actually measuring creativity as a personality trait (Thorndike, 1963). 

There is no evidence from carefully conducted and logically interpreted 

research to indicate that objective tests stifle the creative person. 

Roffmsnn’s charge is what he thinks should be fact rather than research 

data. In other words, Roffmann has the idea that merely because multiplechoice 

tests are highly structured, the examinee has nc opportunity tc 

expr.?ss himself. Nothing could be further from the truth, but the degree 

to which the examinee can express his knowledge depends on the skill and 

the data of the test writer. 

In his second assumption, Roffmann states that multiple choice tests, 

,I . ..penalize the candidate who perceives subtle points unnoticed by less 

able people including the test makers. They are apt to be superficial 

and intellectually dishonest with questions made ortifically difficult 

by means of ambiguity becnuse genuinely searching questions did not 

readily fit into the multiple choice format.” In this assmption the 

grent amount of careful research actually going into the construction 

and validation of a teat item is completely ignored. Naturally, dierractors 

are written purposely to “fool” the less knovledgoa’ble examinee. 

information about the responses to items m*de by persons of different 

levels of knowledge indicates without n doubt that the degree of ambiguity 

perceived by an examinee is inversely related to his knowLedge of the 

subject matter, This simply menns that the less one knows the more 

ambiguous the question appears. Yet, Hoffmann state6 (p. 67), The more 

one knows about the subject the uarc glaring the ambiguities become.” 

Of course Roffmann doea not support his assumption with evidence; 

nevertheless, this charge is the one with which we are moat often hit. 

He saye that the most serious consequence of test ambiguity is that it 

penalizes the gifted and talented examinee. In Hoffmann’ view, how does 

this discrimination occur? When first confronted vith the alternative 

answera to a question, the “deep” examinee, as Hoffmann calls the gifted 

and talented, analyzes the alternatives more carefully than does the 

*‘sup’ -ficial” exsminee. Naturally, such careful scrutinizing takes tfme, 

and the first penalty occurs. 

Secondly, the “deep” student is much more 

likely to perceive the ambiguities and, as a result, spends more time 

trying to determine exactly what the test author had in mind. Furthermore, 

according to Roffmann, the “deep” examinee’s motlvetion to perform 

well tends to be reduced as he sees more clearly the superficiality and 

ineptness of the test writer’s approach. Even more damaging, the gifted 

examinee is more likely to discover a “better” al.ternative than the keyed 

response. 

174 

.

c r- 

’ 

, 

L- 

.’ ./’ 

/ :‘.. 

problem with such an assumption is that there i.8 virtually no satiafsctory 

research defining creativeness. Usually the peopie GAO Gcore high on co 

call.ed “creativity” test6 arc simply the one0 we call creative. In fact, 

there is considerable evidence to indicate that creativity testd are not 

actually measuring creativity as a pereonality trait (Thorndike, 1463). 

There is no evidence from carefully conducted and logically interpreted 

research to indicate that objective teats stifle the creative person. 

Hoffmann’s charge ie what he thinks should be fact rather than research 

data, In other words, Eoffnann ha8 the idea that merely because multiplechoice 

tests are highly structured, the exminee has no opportunity to 

expr.?ss himself. Nothing could be further fros the truth, but the degree 

to h%ich the examinee ten express his knowledge depends on the skill and 

the data of the test writer. 

In his second assumption, Hoffmann states that multiple choice tests, 

1, . ..penalize the candidate uho perceives subtle points unnoticed by less 

able people including the test makers. They are apt to be superficial 

and intellectually dishonest with questions made artifically difficult 

by means of ambiguity because genuinely searching questions did not 

readily fit into the rmrltiple choice format.” In this assumption the 

great amount of careful research actually going into the construction 

and validation of a teet item is cmpletely ignored. Natur.tlly, distr8ctor6 

are written purposely to “fool” the less knowledgeable examil.se. 

Information about the responses to iterns made by persons of different 

levels of knowledge indicates wlthout a doubt that the degree of ambiguity 

perceived by an exuainee is inversely related to his knowledge of the 

subject matter. ‘fiftl simply means that the less one knows the more 

ambiguoue the question appears. Yet, HoffmaM states (p. 67), “The more 

one knows about the subject the rare glaring the a;tilguitice become.” 

Qf course Hoffmann dccs not support hie assumption vith evidence; 

nevertheless, this charge is the one wit.h which we are most often hit. 

He says that the most serious consequence of test ambiguity is that it 

penalizes the gifted and talented examinee. In Hoffmann’s view, how does 

this discrimination occur? When first confronted with the alternative 

answers to a question, the “deep” examinee, as Hoffrrann calls the gifted 

and talented, analyzes the alternatives more carefu2ly than does the 

l’superficialt’ exnminee. Naturally, such careful scrutinizing takes time, 

and :-Be first penalty occurs, Secondly, the “deep” student is much more 

likely to perceive the rrmbiguitlee and, as a result, spends more time 

trying to determine exactly what the teat author had in mind. Furthermore, 

according to Hoffmann, the “deep” examince’s motivation to perform 

well tends to be reduced as he sees more clearly the superficiality and 

ineptness of the test writer’s approach. Even nore damaging, the gifted 

examinee is more likely to discover a “better” alternative than the keyed 

reeponse. 

% 

174 

. 

-____-- - _. 

. 

. 

.

f 

Ih 

i 

i 

Hoffmann attempts to document his reasoning -- and it in mere11 

U’arm-chairing” -- that multipie-choice test qucctitions are by neture 

ambiguous by citing Pample items from test manuals or more frequently 

by attacking illustrative items of his 0.m. A Hoffmann-type item C.W 

quickly illustrate the kind of rumination which forms the main content 

of his attack, 

i 

. 

I . 

Watt are the colors of the American flag?” 

(A) red, white, and blue 

i 

c 

, 0) 

((3 

say 

neither A nor B 

i The superficial examinee quickly selects answer “A,” red, white, and blue, 

and goes to the next question. The “deep” examinee begins to scrutinize 

and analyze the alternatives; he thinks, “A” is correct under some 

conditions, but “B,” gray, Is correct under some conditions too -- twilight, 

poor illumination, total color blindness. Both “A” and “B” could be correct 

under some conditions, but the other alternative, “neither A nor B,” could 

not be correct. Supposedly, ha wonders if the questfon is a trick, if 

the test wrftcr is malicious or just plain ignorant, and what was really 

wanted. Finally in desperation, t,e throvs up his hands and says, llThie 

is an absurd test nnd 1 don’t see how any intelligent person can be asked 

to take it seriouslylf’ And 1 have heard just that in the field on more 

than one occassion. 

This charge of ambiguity is one of most serious consequence to us 

and one on which we are most vunerable unless we consider the fundamental 

distinctions among the purposes of test and test items. I am referring 

to the distinction between tests which are used as criteria ant. tests 

which are u6eu as predictors. - - 

Criterion tests -- for example, achievement 

tests for comparing the effects of dffferent methods of teaching -may 

be open to Hoffmann’s criticisms. But his objections are lrreievant 

in the case of predictive tests -- for example, tests to select the most 

promising job applicant or the best qualified soldier for a special assignment. 

Basically a criterion test must be content valid -- that is, 

we must be able to defend the test on rational and logicalgrounds. All 

that fs required of predictive tests is that they sucessfully predict 

future performance. 

Now where does this leave those of us who use our tests for both 

purposes? Naturally we must attempt to demonstrate that vdt tests have 

both content validity and predictive or concurrent validity. It appears 

chet Hoffmann has never heard of empirical validation, but he manages 

to marshal1 a variety of defense against the evidence. He particularly 

attacks the criterion sgainst which we validate our tests. However) the 

value of a criterion is a very different question frocn the problem of 

whether a test can predict that criterion. Criticizing the tests simply 

175 

t

ecause we do not like the criteria is confusing the issue. Hoffmann’s 

concern rfth ambiguity is relevant only when the content vaLidity of a 

test is the primary constd.eratFon. Eve,1 80, tke customary item acreesing 

and selection in the constructicn of teats insurea a reaeonably satiafactory 

degree of content valfdity. This io particularly true in the 

case of our military tests which are designed to cover specific job areas. 

What about Hoffa%M'S cLaim that the “deep” examinee fs penalized 

by multipLe-choice iteM? What he seems to be saying is that the very 

dull examinee will fail the itea more frequently than the superficially 

bright examinee, but that the exceptLonally bright examinec vi11 have 

more trouble with the same item than the superficially bright examinee, 

Naturally, the brightest examinees are more apt to discover ambiguities 

than are other students. But Hoffmann’s ideas about the consequences of 

their perceptiveness do not hold up. 

To get back to Hoffmann’s assumptions, he states that tests,“... take 

account of only the choice of answer and not of quality of thought that 

lead to the choice,” and “They neglect skill and dtsciplfned expression.” 

This attribute he catls “quality of thought I’ is not defined, nor is any 

reliable and valid way of measuring it suggested. One mfght assume that 

Hoffmann would advocate use of the essay examination as a measure of his 

quality of thought. Such is not the case, for he outlines no less than 

four very convincing arguments why the essay test should not be used for 

that purpose. (L. Difficult to choose a topic fair to a% 2. difficult 

to determine whether the essay is actually revelant to the question; 3. 

difficult to overcome the problem of negatfvc halo due to poor handwriting, 

spelling, etc. ; 4. difficulty to maintain consistency within and between 

scores.) I often hear this quality of thought argument twisted around to 

say that just because a man scores high on the test does not necessarily 

mean that he can do the job. Most generaLly, those advancing this 

argument mean that the man will not do the job. While it may be true that 

a high score on an achievement or’ job knowledge type test does not 

automatically indicate a high degree of motivstion, it is eqtially true 

that the Low scorer cannot perform the fob, that is, he does not have 

the knowledge to perform regardless of hie motivation. Of course this 

depends on test validity and test purpose. It is up to us to define 

the purposes to which oar instruments may be put and to determine the 

validity, for various purposea, of these instruments. 

At this point we might conclude that Hoffmann does not like any kind 

of testing. We would probably be correct. He criticizes objective tests; 

yet he leaves no alternatives for mensurement. When he argues that &he 

results of an educational testing service st*)qy showing an essay teat 

was less good than an objective examinstion are silly and could not 

possibly have been obtained, he is simply confusing content with predfctive 

or concurrent validity. 

176 

,. ^ .._ -.. .__.- 

.

. 

. . . _. . - ._ 

. 

. . 

Perhaps tha potentially mo6t damaging a66w$ion, rind the one which 

would be the mont difffcvlt for R0ffraan.n to 6u6tain, has to do with the 

effect of teatrr in the identification of individual mertt, He 6tate6, 

“They have a pernicious effect on education, and the recognition of merit.” 

Thi6 io cf cour6e -wholly without fcwundation. There have been too many 

6ucce66 6torie6 for u8 to even bother to refute hio claim. 

Now let u6 briefly turn to 6ome of our other critics. A major as- 

6umption found throughout the =fCfnpe of Packed, Barrum, Grose, tatlFte, 

Xarringtm, ard other6 i6 that “mind” cannot be meaeured. Tbcy seem to 

Consider ‘hind” a human “mystery 6y6tem” outeide the realm of rcientific 

otudy. This may be true, but if “Axi” is defined aa behavior, it indeed 

can be measured. In fact, the measurement of man’s fndividual differences 

perhapr ha6 been the greate6t sccwplishment of psychology thus far. We 

can anaeas the individuality of persons rind make pretty good predicitions 

about their future behevior. 

Our critics make tin assumption exactly opposite to the aerrtqtion 

I just mentioned. They se+- that test leads to conformity by picking 

perecmo who are all of the same type. This, of course, as6ume6 that 

“mind” can be measured only to well -- in fact, considerably better than 

we are able to measure it. Anyvay, there is considerable evidence to 

substantiate intra-individual trait variability. Thfa principle of 

p6ychologlcal testing 16 fgnored by these critics, The feet that people 

do differ vithfn themselves has long been recognized srnd submitted to 

careful etudy by peychologfste. 

It i6 complained that testing is an inva< ?on of prlvocy, and this 

criticicm m.6~ have 6ai~e merit. It i6 incumbent upon ‘:6 to demonstrate 

the validity of any item6 which might otherxiae be regarded a6 such an 

invarion. In the military eetting we arc seldom bothered by rhfs problem, 

especially in proficiency evaluation. 

The critic6 very rarely euggert alternative8 to p6yChOlOgiCal testing. 

Gardner, in hi6 book, Excellence, 6aye,“Anyone attacking the ueefulnerrr 

of te6t6 6w6t suggeet workable elternatlves. It ha6 been proven over and 

over again that the alternative method6 of evaluating ability arc subject 

to groso error6 and capable of producing grave injucticcr.” WC have only 

to look et our = rating ryrtem6 to realize how truly Gardaer rpoke. 

Or we could return to the day6 when perronnel decisions vere made on the 

baeir of hair color, family background, shape of the head, or sate other 

equally intuitive basis. The alternatives to testing when they are euggeeted, 

are cIearly rldfculous. 

So far thir morning,1 have attempted to point out the merits of our 

crltlc’6 charge6 agafnat testing, and we can conclude that the proterts 

tith oubatance are those about which we already knew. Hov we ahould take 

a look at the impact of our crftica. I do not believe that the Mllttary

-- 

., 

-__. . . . 

,Establlshment has become disillusioned with testing any more than I 

believe the general public. has. This conference is evidence that the 

Armed Forces are among the strongest supporters of testing. T!l2 protestors 

have not caused any detectable eifect in terms of reduced sale8 

of testfi or testing ecrvicee, The number of letters from educator6 or 

the general public (to reputable test publishers and the APA) critfciztcg 

teats has not increased. Generally, the poFuler pre6e hae not j-pad 

on the band wagon to foment a public outcry against u3. Apparently, the 

net effect has been little more than a few booka sold and a little high 

blood preesure BJXXXI~ psychologists. The latter at least may not be such 

a bad thing. Some of us need to be aroused. 

If this is the case, then why should we even bother to consider 

these self-styled protectors of a tyrannized society? The writings of 

these critics should make us stop, look, and listen. If we are to avoid 

future trouble and Improve the state of our nrt and aclence. we must 

improve the quality of our tests and services. He uwst adhere more 

etrictly to the ethical standards of our profession. We must give 

greater attention to the technfcal cha:acterfstice of our teats and 

criteria. We must continue our ef:orts to expose snd elimfnate the 

quacks and incompetenta who dwell about the fringe of psychological testing. 

And we muat improve our communications about testing 4th every 

segment of our public which we can reach. These cmdnicutions met be 

technically sound but they also must be written in underatandabl.. English. 

We might even form a committee withfn this association to draw up a pmphlet 

of general testing principles. and practices &ich could be diesemfnated 

within the Military Establishment. 

In conclusion, the basic assumptions of our critics are crroneoue 

and ftIliscfou8; they are generally baaed on lack of information, aa 

apparently is the case of Hoffmann, or, more seriously, on a refusal to 

accept the strong eolpirical evidence showing that fudfvfdurlity can be 

accurately assessed in such a way as to give better recognition toreal 

merit than has ever before been the case in our educstional, fnduetrinl, 

or rillitory lnetitution5. 

Actually, we know that standardized objective testing is ona of 

the great success stories of our day. This has been no better pointed 

out than by Gardner, in his book, Excellence (1961). Psychological testing 

for the firat tine enables us to look at the many facets of an individual 

rether than making judgmenta based on the oo-called “lump of dough” 

doctrine. Now we can truly measure and assess the individuality of each 

of our military personnel and through careful guidance help each individual 

realize hts potentialities ae indicated by our peychological 

. 

. 

178 

--_

\’ . 

c 

. 

, 

. 

* . 

. . 

testing iistmente. Tests provide us with the best u~ana avnilable for 

osseseimg individual profi.ciency and discovering md rewardfng indiv?dudl 

merit. We can help the cmadcr i:n t’he f!.eld indfviCuaII.y and different-, 

iaT!.y carei’ul.ly exnoine his men so ther he no longer mJSK depend cn his 

“feel-” or geatoult for pernonnei declaions, This ia cur gresteot strecgth. 

When the facts are Iaid it UnderstandabLy, it cannot be disputed or 

refuted. 

179 

-- -. ._^__ --._ _-... __ . - _. -. - _ ____--- 

I

. 

6. . 

. . 

.i 4 

* 

.I 

.J ,I 

--. 

Dunnotte, H. D. 

prediction, 

Reforencrs 

Some methods for enhnncLng the vsltdita of prJycholoam 

PG read at ~%ympo~i~.~ ~Subgtoupin~Analyois a& AXI 

Approach to the Predfctton of lindfvidual Behavior, St. Lwia. MO,, 1962. 

Ethical otanderds of pcychologfete. Amer. Psychologist, 1963, 2, 56-60. 

Cdrdnor, Y. W. Excellence. New York:Horpar, 1961. 

Hoffman, B. The tyranny of teetine. New York:Crowell-Collier, 1962. 

X80 

I 

--1-- --...--w 

--

. 

Summary Report of the Steering Camnfttee 

The 1964 Steering Committee diacussfono prodklced the following 

resulto: 

a. The preaent mA emble3n Wit8 considered inadequate. After 

viewing a number of sump18 emblems, e bnafc deeign vas selected to be 

modified according to certain specificatr>na. It was agreed that the 

emblem would be drawn up and submitted to the Servfcea for further 

reccxzmendationo or acceptance. 

b. A draft copy of the WA Bylaw? was discussed in detail. 

The recanmended changer will be incorporated in the redraft of the 

Bylaws which wtll be submitted to the Servicee. It fe anticipated 

that a final copy will be presented to the membership at the 1965 

MTA conference. 

c. Major Frank I.. McLanethan of the Air Fcrce was elected 

to the Chairmanehip of the 1965 MTA Steering Colmnfttee. 

181 

..-----

-. 

Item Writers Aptitude Test Development Coornittec Report 

Date of Meeting: 21 October 196t. 

Place of Meeting: EfCA Conference, Fort Benjamin HarrLdon, Indianapolis, 

Indiana. 

Committee Members: Mr. William W. !kr!ce, USA, Chainon 

Lt Harvey C. Gregoire, USAF 

Mr. fn:,rlte A, Nudoon, US?4 

I:r. John D. Kraf t, USA 

Hajor Joe R. Shafer, USAF 

#r. William M. Minter, USA 

hmary of Discussion: 

a. All services rsprejented gave a brief description of hov their 

test item writers are selected, None of the service8 currently u8es er.y 

special selection procedures other than subject-mhttar knowledge. The 

coernittee members felt that Subject-matter background was not a serious 

problem since the itcm uritars befng designated for test item construction 

duties are generally adequate in this respect. 

b. In general) the item urttero in all services perform skmilar 

duties. The Navy and Air Porte have their ftem writera come to a central 

location here they work directly with tC8t speclalist8. The Amy operates 

on a more decentralized basis in that the item writers are located mainly at 

Army Sertlice Schools and construct test itomr bared on wrtttcu speclffcotionr 

sent to them by the US Army Enlisted Evaluatior, Center. Test Sp.?Ciatt~tS 

from the latter locetfon make TDY trips to the item-writing agencies to 

coordinate the test development efforts, 

c. The ccmmittes felt that the followfng two major areas should be 

inveetigated as being most likely to provide valid fact ~8 which would 

asslat in screening the better from the poorer test item writers: 

(1) Personal hfatory items. 

(2) Test results obtained from measuring factoru, such a8 English 

usage, intelligence, and analytical reading ability. , 

d. Some “brain storming” vas done by the cora3fttee to get some fndfcation 

of the types of personal history items or teets that might be used 

to asse8s item writera’ aptitude. However, the committee felt that a 

thorough and ccmprehenstve analysfe would need to be made of the f tern writer 

duties before the best groupfng of potential personal history item8 and 

tests could be compiled for use in fretting up an experimental desfgn appropriate 

for conducting a validation study. 

1. - 

. 

182 

_ . _ _. 

_.._.-_--.. -_--- -.--.-- 

- - _^ 

.- 

. . 

- 

.

e. Lt Gregolre, US&?, indicated that the S*KI (Specialty Knowledge 

Test) Branch at the Personnel Research Laboratory at Lsckland AI% was 

very much interested in the asseamnent of ?tes writer aptitudes. RC 

also fndicatcd that sufficient data and processing support very likely 

would be available at Lacklnnd AFB to conduct a preliminary study related 

to assessment of itm writers, 

f. Tentative plans wer

. 

i . 

-- 

t 

! 

Ens Harold AdaL-an 

Hr. fied AlRrad 

Dr. Neal B. A~zdragg 

Lt Thumars 11. Atchley 

Mr. 841~ II. Baker 

Hr. Rqmond V. A&at 

Mr. Vernon H. Begge 

pi-r. Walter W. Rirdaill 

Dr. Warren S. Bluxmfald 

Mr. M&eel J. Bodi. 

Lt Barbara Bole 

Hr. John S. Brand 

Wr. Claude P. Brfdgee 

Mr. Donald A. Brown 

2d Lt Martin S. Brown 

Hre. Habal 0. Brunner 

ROSTER OF CONFEREEES 

Yx. Franklin S. Buckwmlter US Amy Quertermaatcr School 

Xaj Robert A. Burgesa 

Mr. Anthony Cmciglia 

u??lT ATTACHED 

- __yI-- 

‘tlS Coset msrd Homdquartera 

US Coast &ard Training Cmto?: 

US Amy Security Agency 

us Am] w1itGy Polfce corpm 

US Naval ExDmFninS Cente: 


6570th Personnel Research bborutowry 

US Air Force 

US Haval Exminlng Center 

US Naval Em~ining Center 

US Naval Peraonncl Research Activity 

Sm Diego 

US Amy Enli.ett:l Evaluation Center 

US Naval Exmlning Center 


US Amy Enlisted Evsluetton Center 

US Amy Mr.dical Service, Fitzeinmons 

US Amy Enlf.sted Evaluation Center 

Headquarters, UZ Air Force 

US Army Enlfrrted Evaluation Center 

US Army Security Agency 

US Amy Enlisted Evaluation Center

_’ 

. ‘- 

-. 

I 

I 

: 

e . 

I. .I 

--.. .._ _ 

L 

C 

. 

c 

--_I.--- 

. ..-A. 

Capt. JCE?C?B B. Carpenter 

Mr. Charles E. Caosidy 

:tt co1 c. f. chafenon 

Mr. Tbamz E. Chandler 

!XC William Q. Chcnn 

ILt Co1 Kent J. Ccllfngr 

lb. John W. Credlford 

Maj David P. Culclasure 

Cdr R. J. Dahlby 

M~aj Donald L. Dimand 

l.at Lt Duncan L. Dieterly 

Co1 Jamer C . Donaghey 

Dr. Henry J. Duel 

Mr. Erling A. Dukrrrchein 

?CSgt Raymond R. Durand 

1.6t Lt Jane* H. Durden 

Mr. Bernard J. Foley 

Mr. Step‘.:+- W. Pctlr 

Mr. John L. Pinucrnr 

Lt C. E. Csngenbneh 

Mr. Ronald K. Goodnight 

Cdt Robert J. Gray 

____.- _.... -. 

185 

6570th Personnel Research Laboratory 

US Air Force 0 

US Army Enlitited Evaluation Center 


US hrmy Southeastern Signal School 

US Army Cmbat Survefllance Schocl 


US Naval Zxamining Center 

US Amy Hedical Field Service 

US Coast Guard Training Center 

US t??rfne Corpr Inotftuta 

6570th Personnel Raecarch &sborrtory 

US Air Force 

US Army Enlfetcd Evaluation Center 

Headquarters, US Air Force 

US Naval Exmining Canter 

US Amy Security Agency Training Center 

US Army Ordnance Guided Hierilc School 

US Amy Security Agency Trafnlng Center 


US Air Force 

US Army Enlfrted Evaluation Center 

US Army Enliertb :;:lurtion Center 


. -, -..-.. _ _ 

__, y

1 

/ 

. . . 

I 

. 

Cape Harry H. Craer, Jr. 

1st Lt Harvey Cregorie 

Ltjg Clyde A. Gronewold 

1st Lt Thomar H. Guback 

Mr. Robert L. Cup 

(Cdr Frederick J. Hancox 

Mr. Clayton B. Haradon 

I,t ‘Co1 Roy E. HaIrin, Jr. 

Capt R. H. Hayee 

F!r. Jock E . Hohrelter 

tfx. Fred 8. Hona 

Mr. John C. Houtr 

Hr. John J. Hcjbell 

Hr. Charles A. Hudson 

Hr. Clifford 2. Hutrlcy 

Mr. William L. Jackson 

HI :. E. C. Johnron 

Hr. L. W. Johnston 

Lt jg Katherine Kadenacy 

T.Edr Lawrence R. Kilty 

2d Lt Lloyd 0. Kimery 

Mr. Albert Kind 

UFXT ATTACREZD 

us Navy (Roti;ed) 


US Air Force 

US Navel Examining Center 

US Amy Defense Xnformotion School 


US Coast Guard Headquarter6 

6570th Pcreonnel Research Laboratory 

US Air Force 

186 

Randolph Air Force Base 


US Army Ealieted Evaluation Center 

US Army Enlisted F,valu:tFon Center 

US Naval Exemfning Center 

US Army AzzYor School 

US Naval Exeminlng Center 

US Army Signal School 

US Army Aviation School 

US Army Enlisted Evaluation Ctnter 

US Naval Examlnlng Center 


US Naval Security Group Headquarter6 

US A- Training Ctnter, Englnoars 

US Army Combat Surveillance School 

. . 

. . 

% 

. 

_

_. . . i 

I 

. 

c 

Lt Co1 Albert S. Knauf 

Xr. Richard S. Kneirel 

Xr. John Kraft 

Cdr Ger?e L. Lane 

Mr. Larry J. LeBlanc 

LCdr Jack D. Lee 

Dr. C. L. John Lagere 

1st Lt Robert H. Lennevflle 

Lt Alexander A. Longo 

Xr, Charles J. Macaluao 

Capt Jack H. Harden 

Lt David R. Hnrkep 

Capt Joreph P. Kartin 

Hr. Curtis D. KcBrlde 

Hr. NflliuJ X. Hinter 

Dr. Joseph E. Harsh 

Mr. Ieadora J. Nevolpn 

Hr. W. Alcn Nicewander 

Ltfg Richard L. Olsen 

rtrj Herr11 R . Ower, 

1st Lt Arnold J. Pals 

ICdr Ralph Palverky 

-___ ..- 

. . 

6570th Personnel. Research Laboratory 

US Air Force 

US Amy Chamicnl Center and School 

OS Amy Enlfsted Eveluatlon Center 

Bureau of Naval Peroonnel 

Lackland Air Force Base 

US HAVA~ Examfning Center 

US Army Security Agency Training Center 

Navy School of Horic, Army Element 

US Naval Air Technical Training Center 

US NAVR~ Examining Center 

US Army Judge Advocate General School 

US Coast Guard Training Cer.ter 

US Coaat Guard Training Center 

US Axuy Artillery and Xiaeile School 

US Array Chemical Center and School 


US Air Force 

6570th Personnel Research LPboratory 

US Air Force 

US Army Enlirted Evaluation Center 

US Coart Guard Training Center 

US Amy Bnlieted Evaluation Center 

US Army Medical Center, Walter Reed 

Naval Examination Center, Royal Canadian 

N&V 

107 

. 

. 

_ I .- . . - 

I 

_.-- I .-__I .*.-a 

._- --- _ . . ^ e-e... .-- 

1

. 

‘. 

. & 

. 

e 

. 

Fir. John J. Parke 

Mr. J. E. Ptrtingtcm 

Capt Kennuth A. Petri& 

M&j Jams N. Payne 

Hr. Henry W. Pepin 

Lt David L. Popple 

Mr. William Ii. pitman 

Uaf Joseph T, Polanrki 

Capt Carl R, Powers 

Mr. Frank R. Prfcs, Jr. 

Lt Co1 Robert A. Rerrnnyder 

\ Msj Clinton D. Regelia 

. 

I 

-.. 

: 

-- ..--- 

Mr. John H. Roth8 

Hr. Jack Rubak 

Hr. Carl Rudlnekl 

let L t Jumee L. Rueaell 

Maj William H. Sallejr 

Cept Clsrance D, Sspp 

Mrs. Genevieve K. Schulter 

Hoj Joe R. Shaafer 

Mr. Jean B. Sheppard 

Hr. Edwin C. Shtrkey 

SSgt Predrick J . Shunk 

. 

, 

, 

UN IT AYL’AC MD 

US Amj Ordnmce Guided Memile School 

US Army Enll$ted Evaluation Center 

US Amy Combat SurveilLaxe School 

US At-my Defenlre Atomic Support Agency 

US Amy Defame AtmEc Support Agency 

US Coast Guard Reserve Trainfng Center 

US Army Ordnance Center and School 

US Army Military Police School 

US Defense Atomic Support ABWICY 


US Army Enlirttd Evaluation Center 

US Amy Defenm Information School 

Dept of Wavy, Bureau of Naval Personnel 

US Asq Dafanes Inforjletion School 

US Ar~rg S’qnal Cen.tar and School 

US Amy Infantry School 

US Army Intelligence School 

US Amy Air DL’fenoe School 

US laval Examining Center 

6570th Perronnel Rescsrch Laboratory 

US- Air Force 

188 

US Army Southeastern Signal School 

US Amy Enliattd Evaluation Center 

US Army Military Police School 

I 

‘. 

. 

. - 

, 

Y 

__. . --- 

. 

4 

.

.___....-- 

i 

Mr. Willi6m P. SFm 

Capt Loren K. Smith 

j 

. Mr. Jane6 W. Smith 

1 

! 

) * 

- 

Mr. MAxon Ii. Smith 

Xdr Doncnld H. Tart 

Dr. June6 D. Teller 

ICdr Prrncea S. Turner 

Hr. Vern W, Urry 

Dr. Raymond 0. Weldkoetter 

Mt. FrUICi6 B . Waleh 

Mr. William W. W6nce 

Dr. Donald L. W666 

Mr. kdalbdrt U. Wafabrod 

Lt Berl R. Wll.li6ms 

Ens Thomas H. Wilson 

c Hr. C66tier S. Wlniwi66 

Dr. Michael A. Zaccarirr 

_ ._ ._ -, - _ . - 

. 

_. 

. . 

189 

UNIT ATTACHED 

US Arq Air Dafennse School 

US Army Intelligence School 

US Amy Ordnance Guided Miasfla School 

US 8rzz-y Ordnwce Gufdod Hiceile School 

Royal CanadiAn N6vy 

Headquartera, US Air Force 

US Coa6t Guard Training Center 

US Army Enlirted Evaluation Center 

US Army Enlieted Evuluation Center 

US Army Security Agency Training Center 

US Army Enlisted Evaluation Centur 

US Arq EnAl. l’6t6d Evaluatfon Center 

US Army Ccrpa of Engineer6 



US Naval ExaininLng Center 

Lockland Air Porca Baas Training Center 

.

6.. 

, 

. . 

1% 

+ , 

,* 

___ 

. 

‘. * 

, ~ I 

, -I . . ; 

/ . . ‘x 4. . - . 1; \ .‘.), *.


PLEASE DO NOT RETURN 

THIS DOCUMENT TO DTIC 

EACH ACTIVITY IS RESPONSIBLE FOR DESTRUCTION OF THIS 

DOCUMENT ACCORDING TO APPLICABLE REGULATIONS. 


_. - .- ._. .-

Technical Report - International Military Testing Association

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?