19.01.2013 Views

The Journal of the World Council for Gifted and Talented Children

The Journal of the World Council for Gifted and Talented Children

The Journal of the World Council for Gifted and Talented Children

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

What is measurement invariance <strong>and</strong> why does it matter?<br />

<strong>The</strong> <strong>World</strong> <strong>Council</strong> <strong>for</strong> <strong>Gifted</strong> <strong>and</strong> <strong>Talented</strong> <strong>Children</strong><br />

Put in a <strong>for</strong>mal definition, measurement invariance (MI) examines “<strong>the</strong> extent to which items<br />

or subtests [or whole tests] have equal meaning across groups <strong>of</strong> examinees” (French & Finch, 2006,<br />

p. 379). If members <strong>of</strong> one group consistently score lower than members <strong>of</strong> ano<strong>the</strong>r group, simply<br />

due to group membership, <strong>the</strong>n <strong>the</strong> test is not yielding valid in<strong>for</strong>mation <strong>and</strong> can be considered noninvariant.<br />

What constitutes a “group” is not universally accepted because such definitions must be<br />

based on <strong>the</strong> <strong>the</strong>ory underpinning <strong>the</strong> topic <strong>of</strong> interest. For example, many measures <strong>of</strong> academic<br />

achievement are highly correlated with family income (Valencia & Suzuki, 2001). Because <strong>of</strong> this, in<br />

education, income groups should be considered <strong>for</strong> MI/E evaluation. <strong>The</strong> importance <strong>of</strong> investigating<br />

MI/E with regard to a testing instrument cannot be overstated. An example <strong>of</strong> an instance where this<br />

is especially important can be seen in <strong>the</strong> Naglieri Nonverbal Abilities Test (NNAT: Naglieri, 2003).<br />

This instrument has seen widespread use by schools in an attempt to identify gifted <strong>and</strong> talented<br />

students who come from a wide variety <strong>of</strong> backgrounds. Naglieri <strong>and</strong> Ford (2003) have made <strong>the</strong><br />

claim that <strong>the</strong> NNAT is culturally neutral – meaning that an individual’s membership in a traditionally<br />

underrepresented group has no bearing on <strong>the</strong> individual’s NNAT score. Put into MI/E language, this<br />

claim is one <strong>of</strong> NNAT invariance with regard to such factors as ELL, income, or racial / ethnic group<br />

status. When such a claim or assumption is a critical component, as with <strong>the</strong> NNAT, <strong>the</strong> importance<br />

<strong>of</strong> <strong>the</strong> evaluation <strong>and</strong> presentation <strong>of</strong> research regarding MI/E seems clear – this issue being noted in<br />

<strong>the</strong> Buros Mental Measurement Yearbook review <strong>of</strong> <strong>the</strong> NNAT (French, 2005). Without such<br />

in<strong>for</strong>mation, practitioners <strong>and</strong> researchers have no way <strong>of</strong> knowing if <strong>the</strong> NNAT is any more or less<br />

culturally loaded than traditional st<strong>and</strong>ardized ability tests <strong>of</strong> achievement or ability or specific<br />

measure <strong>of</strong> giftedness <strong>and</strong> talent.<br />

Typical ability or achievement tests <strong>of</strong>ten report how validity evidence was evaluated in <strong>the</strong><br />

technical manual as recommended in <strong>the</strong> Code <strong>of</strong> Fair Testing Practices in Education (JCTP, 2005)<br />

<strong>and</strong> <strong>the</strong> St<strong>and</strong>ards <strong>for</strong> Psychological <strong>and</strong> Educational Testing (AERA, 1999). However, <strong>the</strong> reporting<br />

<strong>of</strong> MI/E or evaluation <strong>of</strong> group differences has yet to see widespread inclusion in test manuals or<br />

reviews. <strong>The</strong> reason this causes a problem is that test users (e.g., schools, teachers, parents,<br />

researchers) have no way <strong>of</strong> knowing if <strong>the</strong> test yields equally valid in<strong>for</strong>mation <strong>for</strong> any one group <strong>of</strong><br />

students as it does <strong>for</strong> ano<strong>the</strong>r. In <strong>the</strong> example from above, <strong>the</strong> NNAT might yield an average score<br />

15 points lower <strong>for</strong> students from low-income families. This is not a hypo<strong>the</strong>tical situation, but ra<strong>the</strong>r<br />

was observed in a study by Carman <strong>and</strong> Taylor (2010). Since <strong>the</strong> NNAT is not meant to measure<br />

income or SES, this could indicate non-invariance or bias against that particular group.<br />

Un<strong>for</strong>tunately, mean-difference testing or general linear model methods, such as ANOVA, are not<br />

sufficient to evaluate all aspects <strong>of</strong> a test that could suffer from non-equivalence, which is better<br />

evaluated using latent variable models such as those in <strong>the</strong> covariance structure or structural<br />

equation model family (Thompson & Green, 2006). Fur<strong>the</strong>r, <strong>the</strong> origin <strong>of</strong> any observed nonequivalence<br />

can be difficult to evaluate. In <strong>the</strong> case <strong>of</strong> <strong>the</strong> Carman <strong>and</strong> Taylor (2010) study, <strong>the</strong><br />

observed differences could be due ei<strong>the</strong>r to actual group differences on <strong>the</strong> underlying construct, or<br />

it could be <strong>the</strong> case that <strong>the</strong> instrument simply does not work in <strong>the</strong> same way <strong>for</strong> members <strong>of</strong><br />

different income groups (non-invariance). Without some kind <strong>of</strong> MI/E evaluation, <strong>the</strong>re is no way to<br />

know. As <strong>the</strong> NNAT st<strong>and</strong>s, users <strong>and</strong> test-takers have no way to know if <strong>the</strong> instrument yields<br />

equally valid in<strong>for</strong>mation <strong>for</strong> students from dominant <strong>and</strong> non-dominant cultural groups alike.<br />

Conducting measurement invariance testing using Confirmatory Factor<br />

Analysis<br />

Evaluating MI/ E can be approached from several different perspectives. Two <strong>of</strong> <strong>the</strong> most<br />

common <strong>of</strong> <strong>the</strong>se approach <strong>the</strong> issue from <strong>the</strong> item response <strong>the</strong>ory (IRT) or structural equation<br />

modeling (SEM) perspectives, <strong>the</strong> latter being <strong>the</strong> confirmatory factor analysis example presented<br />

here. Those interested in <strong>the</strong> IRT approaches should consult <strong>the</strong> new book by Osterlind <strong>and</strong> Eversen<br />

(2009) or <strong>the</strong> seminal work by Lord (1980).<br />

<strong>The</strong> first step in conducting MI/ E testing using SEM is to specify <strong>the</strong> model to be tested. For<br />

most cases this is simple as items on a test are written to assess certain factors or subscales. For<br />

example, <strong>the</strong> NNAT reports a single global score, but includes four types <strong>of</strong> items: pattern<br />

completion, reasoning by analogy, serial reasoning, <strong>and</strong> spatial visualization. Specifying which items<br />

are meant to assess which factors serves as <strong>the</strong> model to be tested <strong>for</strong> each group separately. Even<br />

if <strong>the</strong>re is only a single factor (perhaps <strong>the</strong> general ability or “G” factor), this is still a <strong>the</strong>oretical model<br />

100 <strong>Gifted</strong> <strong>and</strong> <strong>Talented</strong> International – 26(1), August, 2011; <strong>and</strong> 26(2), December, 2011.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!