Natural Language Item Response Theory and ... - University of Iowa

Natural Language ItemResponse Theory and theAssessment ofPsychopathologyKristian E. MarkonUniversity of Iowa

Assessment of Psychopathology andResponse Format●●Most tests and inventories use structuredresponse format● e.g., Likert (“true-false”, “1-5”)●structured interviewSimplifies quantification, but has limitations●●can be awkward, restrictiveresponse meaning can be ambiguous

Assessment of Psychopathology andResponse Format●●One possibility is to use natural language (i.e.,free) format responses●●more flexible“information-rich”Quantification of such responses is expensive,however●●generally requires ratersraters may require training, etc.

Natural Language Item ResponseTheory (NLIRT)●Incredible progress in statistical modeling ofnatural language data; e.g.,●spam filters● text classification (e.g., Blei, Ng, & Jordan, 2003)●●essay scoringThese methods can be used to develop naturallanguage item response theory (NLIRT)

Natural Language Item ResponseTheory (NLIRT)●●NLIRT is like other types of item responsetheory:●●●provides statistical model of item responsequantifies measurement properties of itemsallows one to assign scores to individualsUnlike other forms of item response theory, isfocused on natural language responses

Statistical Modeling of NaturalLanguage Data●●Statistical modeling of natural language datacomprises two basic steps:●●tokenization—process of creating tokensmodeling—modeling of tokensBasically, breaking natural language responsesinto “modelable units”, and then modelingthem

What are some things you’vedone that have been especiallyirresponsible?

Natural Language Item ResponseModel●●●Like other IRT models, model probability ofeach response as function of underlying traitIn NLIRT, response is token (e.g., word)Underlying trait is unobserved, latent●●inferred from patterns of responses acrossitemssingle trait underlies responses to items

Natural Language Item ResponseModel●●For each item, model has two parameters foreach response (i.e., each token in vocabulary):●●discrimination (a)—how informative?severity (b)—where is it informative?Very similar to nominal response model forpolychotomous data

Natural Language Item ResponseModel●●Probability of a particular token being used isfunction of latent trait (θ) and token discrimination(a) and severity (b)Equal to the percent of the “propensity space”represented by that particular response.P w i ∣=expa i b i ∑ iexpa i b i = Propensity to use w iTotal propensity



Example Responses (θ = -1.06)Describe times you’ve had problems with thelaw:“Never.”What are some things you’ve done that havebeen especially irresponsible?“Not turning in homework assignments on time.”

Example Responses (θ = 1.12)Describe times you’ve had problems with thelaw:“I stole a case of beer ... got an OWI, got arrested fordriving without a license, noise complaints.”What are some things you’ve done that havebeen especially irresponsible?“Drinking and driving, having my [family] see mecome home drunk”

Criterion-Related ValidityScale r pBFIExtraversion -.07 .65Agreeableness -.32 .02Conscientiousness -.26 .05Neuroticism -.07 .67Openness .22 .18ESI .44 .01IDAS -.10 .72

Psychometric Insights Provided byNLIRT●●●Psychometrically, natural language items arelike testlets or “mini-tests”Also like items that you can respond to anindefinite number of timesEvaluating natural language items may requireconcepts like item reliability

Summary and Future Directions●●Developing and using psychometrics ofnatural language response items is feasibleMuch work to be done:●●●●more complex tokenshow to evaluate items, modelsrefining scoring, quantification ofmeasurement errorjoint use with fixed-format items

Natural Language Item Response Theory and ... - University of Iowa

Create successful ePaper yourself

Delete template?

Save as template?