The Journal of the World Council for Gifted and Talented Children

Recommendations

Info

�� Abstract Scott J. Peters Practitioners and researchers often review the validity evidence of an instrument before using it for student assessment or in the practice of diagnosing or identifying children with exceptionalities. However, few test manuals present data on instrument measurement equivalence/ invariance or differential item functioning. This information is critical as it allows the user to determine if the instrument yields equally valid information for a diverse group of children. This article presents the rationale and need for such information as well as a detailed process for how test developers, practitioners, or education researchers might complete their own evaluation of instrument invariance. Keywords: Gifted; talented; identification; validity evidence; test development. Introduction The following hypothetical situation should sound familiar to many people involved in gifted and talented student identification. Suppose a teacher or administrator is involved in a school’s student identification plan and that as a part of that plan the school uses a standardized achievement test, ability test, or any other quantitative instrument or rating. Such instruments are made up of multiple items/ questions that serve as indicators of academic ability, achievement in a specific content area, school readiness, a particular disability, or some other indicator of giftedness. In an average school, such a test would be given to students from a variety of ethnic, racial, and socioeconomic backgrounds. This is not hard to imagine, since as of 2005 African, Native, Asian, and Hispanic American students have made up at least 42% of public school students in the United States (NCES, n.d.). A critical (and often overlooked) assumption of standardized assessments is that they measure the same traits in the same way for all students. Put another way, all assessments should measure the same thing in the same way regardless of subgroup membership: what country the student is from, how much money his or her parents make, or his or her religious preference. If this assumption holds true, then the test could be used as one of several indicators to make gifted education placement decisions or by a researcher in a study related to giftedness. If such an assumption is false, or even if it has not been researched, then the test should not be used in such high-stakes decisions because the test may not be a valid measure of the same trait across different groups. Often test developers or users argue that a test has been “validated” (something that is actually not possible given currently accepted usage of this terminology (Sireci, 2007)) on a particular diverse group. Unfortunately, what has happened is a case where an instrument was developed using a large number of individuals (often middle to upper-income members of the dominant cultural group) can drown out lack of instrument validity for nonmajority individuals. When educators or researchers are interested in the degree to which one test measures the same trait across multiple groups, there are concerned with measurement invariance or equivalence (often abbreviated MI/E). As discussed by Vandenberg and Lance (2000), “the establishment of MI/E is a precondition for conducting substantive group comparisons” (p. 12). As a field that focuses a great deal of time and effort in making group comparisons, the fields of education, especially those dealing with students with exceptionalities, need to spend more time questioning the measurement equivalence or invariance of the tests so often used. Gifted and Talented International – 26(1), August, 2011; and 26(2), December, 2011. 99
What is measurement invariance and why does it matter? The World Council for Gifted and Talented Children Put in a formal definition, measurement invariance (MI) examines “the extent to which items or subtests [or whole tests] have equal meaning across groups of examinees” (French & Finch, 2006, p. 379). If members of one group consistently score lower than members of another group, simply due to group membership, then the test is not yielding valid information and can be considered noninvariant. What constitutes a “group” is not universally accepted because such definitions must be based on the theory underpinning the topic of interest. For example, many measures of academic achievement are highly correlated with family income (Valencia & Suzuki, 2001). Because of this, in education, income groups should be considered for MI/E evaluation. The importance of investigating MI/E with regard to a testing instrument cannot be overstated. An example of an instance where this is especially important can be seen in the Naglieri Nonverbal Abilities Test (NNAT: Naglieri, 2003). This instrument has seen widespread use by schools in an attempt to identify gifted and talented students who come from a wide variety of backgrounds. Naglieri and Ford (2003) have made the claim that the NNAT is culturally neutral – meaning that an individual’s membership in a traditionally underrepresented group has no bearing on the individual’s NNAT score. Put into MI/E language, this claim is one of NNAT invariance with regard to such factors as ELL, income, or racial / ethnic group status. When such a claim or assumption is a critical component, as with the NNAT, the importance of the evaluation and presentation of research regarding MI/E seems clear – this issue being noted in the Buros Mental Measurement Yearbook review of the NNAT (French, 2005). Without such information, practitioners and researchers have no way of knowing if the NNAT is any more or less culturally loaded than traditional standardized ability tests of achievement or ability or specific measure of giftedness and talent. Typical ability or achievement tests often report how validity evidence was evaluated in the technical manual as recommended in the Code of Fair Testing Practices in Education (JCTP, 2005) and the Standards for Psychological and Educational Testing (AERA, 1999). However, the reporting of MI/E or evaluation of group differences has yet to see widespread inclusion in test manuals or reviews. The reason this causes a problem is that test users (e.g., schools, teachers, parents, researchers) have no way of knowing if the test yields equally valid information for any one group of students as it does for another. In the example from above, the NNAT might yield an average score 15 points lower for students from low-income families. This is not a hypothetical situation, but rather was observed in a study by Carman and Taylor (2010). Since the NNAT is not meant to measure income or SES, this could indicate non-invariance or bias against that particular group. Unfortunately, mean-difference testing or general linear model methods, such as ANOVA, are not sufficient to evaluate all aspects of a test that could suffer from non-equivalence, which is better evaluated using latent variable models such as those in the covariance structure or structural equation model family (Thompson & Green, 2006). Further, the origin of any observed nonequivalence can be difficult to evaluate. In the case of the Carman and Taylor (2010) study, the observed differences could be due either to actual group differences on the underlying construct, or it could be the case that the instrument simply does not work in the same way for members of different income groups (non-invariance). Without some kind of MI/E evaluation, there is no way to know. As the NNAT stands, users and test-takers have no way to know if the instrument yields equally valid information for students from dominant and non-dominant cultural groups alike. Conducting measurement invariance testing using Confirmatory Factor Analysis Evaluating MI/ E can be approached from several different perspectives. Two of the most common of these approach the issue from the item response theory (IRT) or structural equation modeling (SEM) perspectives, the latter being the confirmatory factor analysis example presented here. Those interested in the IRT approaches should consult the new book by Osterlind and Eversen (2009) or the seminal work by Lord (1980). The first step in conducting MI/ E testing using SEM is to specify the model to be tested. For most cases this is simple as items on a test are written to assess certain factors or subscales. For example, the NNAT reports a single global score, but includes four types of items: pattern completion, reasoning by analogy, serial reasoning, and spatial visualization. Specifying which items are meant to assess which factors serves as the model to be tested for each group separately. Even if there is only a single factor (perhaps the general ability or “G” factor), this is still a theoretical model 100 Gifted and Talented International – 26(1), August, 2011; and 26(2), December, 2011.
Page 1 and 2:
��
Page 3 and 4:
The World Council for Gifted and Ta
Page 5 and 6:
Page 7 and 8:
Book Reviews The World Council for
Page 9 and 10:
Examining Opinions and Challenging
Page 11 and 12:
Finally, many thanks go to Kevin La
Page 13 and 14:
Page 15 and 16:
Page 17 and 18:
Page 19 and 20:
Name Full-time Gifted Class Gail* A
Page 21 and 22:
Page 23 and 24:
Page 25 and 26:
About the Authors The World Council
Page 27 and 28:
Page 29 and 30:
Table 4: The results of the (2x2) t
Page 31 and 32:
About the Author The World Council
Page 33 and 34:
undertaking the unit, with many com
Page 35 and 36:
Page 37 and 38:
Figure 1: Pre & post scores for ite
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
(1 st , 3 rd & 4 th were negatively
Page 45 and 46:
Conclusions The World Council for G
Page 47 and 48:
Page 49 and 50: The World Council for Gifted and Ta
Page 53 and 54: egulated and possible with almost n
Page 63 and 64: About the Authors The World Council
Page 65 and 66: This article argues that competitio
Page 83 and 84: inborn. In the case of Vietnam, ano
Page 87 and 88: the exam. This implied that success
Page 91 and 92: Humor Styles Questionnaire (HSQ). S
Page 99: About the Authors The World Council
Page 109 and 110: Conclusions The World Council for G
Page 113 and 114: learning strategies employed and le
Page 119 and 120: students have little choice in the
Page 121 and 122: Appendix 1 Interview schedule Quest
Page 123 and 124: sufficiently address their individu
Page 143 and 144: About the Authors The World Council
Page 151 and 152:
Page 153 and 154:
Page 155 and 156:
Page 157 and 158:
Page 159 and 160:
Conclusions The World Council for G
Page 161 and 162:
Page 163 and 164:
Page 165 and 166:
Page 167 and 168:
Page 169 and 170:
Page 171 and 172:
Page 173 and 174:
Page 175 and 176:
Page 177 and 178:
Statistics The World Council for Gi
Page 179 and 180:
Page 181 and 182:
show all

The Journal of the World Council for Gifted and Talented Children

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?