Report on Knowledge Assessment Construction and Validation for ...

tennessee.edu

Report on Knowledge Assessment Construction and Validation for ...

ong>Reportong> on Knowledge Assessment

Construction and Validation for

New Case Manager Certification

April 2007

PREPARED FOR

The Tennessee Department of Children’s Services

BY

JULIANNA MAGDA, MS

SISSIE HADJIHARALAMBOUS, PhD

THE UNIVERSITY OF TENNESSEE

COLLEGE OF SOCIAL WORK

OFFICE OF RESEARCH AND PUBLIC SERVICE


REPORT ON KNOWLEDGE ASSESSMENT CONSTRUCTION AND VALIDATION

FOR NEW CASE MANAGER CERTIFICATION

The University of Tennessee

College of Social Work

Office of Research and Public Service

KAREN SOWERS, DEAN

PAUL CAMPBELL, DIRECTOR

The University of Tennessee does not discriminate on the basis of race, sex,

color, religion, national origin, age, disability or veteran status in provision of

educational programs and services or employment opportunities and benefits.

This policy extends to both employment by and admission to the University.

The University does not discriminate on the basis of race, sex, or disability in its

education programs and activities pursuant to the requirements of Title VI of the

Civil Rights Act of 1964, Title IX of the Education Amendments of 1972,

Section 504 of the Rehabilitation Act of 1973, and the Americans with

Disabilities Act (ADA) of 1990.

Inquiries and charges of violation concerning Title VI, Title IX, Section 504,

ADA or the Age Discrimination in Employment Act (ADEA) or any of the other

above referenced policies should be directed to the Office of Equity and

Diversity (OED), 1840 Melrose Avenue, Knoxville, TN 37996-3560, telephone

(865) 974-2498 (V/TTY available) or 974-2440. Requests for accommodation of

a disability should be directed to the ADA Coordinator at the UTK Office of

Human Resources, 600 Henley Street, Knoxville, TN 37996-4125.

The University of Tennessee, Knoxville, in its efforts to ensure a welcoming

environment for all persons, does not discriminate on the basis of sexual

orientation in its campus-based programs, services, and activities. Inquiries and

complaints should be directed to the Office of Equity and Diversity.

Project # 07048

THE UNIVERSITY OF TENNESSEE COLLEGE OF SOCIAL WORK OFFICE OF RESEARCH AND PUBLIC SERVICE APRIL 2007

i


REPORT ON KNOWLEDGE ASSESSMENT CONSTRUCTION AND VALIDATION

FOR NEW CASE MANAGER CERTIFICATION

Table of Contents

List of Tables .......................................................................................... iv

List of Figures ......................................................................................... v

Introduction ............................................................................................. 1

Background ..................................................................................................... 2

Development of Knowledge Assessment Objectives and

Specifications/Blueprint .................................................................................. 3

Item Writing and Item Editing ........................................................................ 4

Field Administration........................................................................................ 5

Item Analysis ........................................................................................... 6

Methodology............................................................................................ 7

Classical Test Theory ...................................................................................... 7

Initial Analysis ................................................................................................ 9

Number of Items Assessed .................................................................. 12

Future Directions .................................................................................. 13

Item Response Theory (IRT)......................................................................... 13

Conclusion............................................................................................. 18

Comparing Classical Test Theory and Item Response Theory ..................... 18

Exploring Exam Bias..................................................................................... 18

References............................................................................................. 20

THE UNIVERSITY OF TENNESSEE COLLEGE OF SOCIAL WORK OFFICE OF RESEARCH AND PUBLIC SERVICE APRIL 2007

ii


REPORT ON KNOWLEDGE ASSESSMENT CONSTRUCTION AND VALIDATION

FOR NEW CASE MANAGER CERTIFICATION

Appendix A: Item Analysis Summary.................................................. 21

Appendix B: Comparison of Item Performance

in End of Course 4, Version 1 and Version 4...................................... 23

Appendix C: Comparison of Item Performance in

End of Course 4 Version 1, Version 3, and Version 4 for

Selected Items ....................................................................................... 25

THE UNIVERSITY OF TENNESSEE COLLEGE OF SOCIAL WORK OFFICE OF RESEARCH AND PUBLIC SERVICE APRIL 2007

iii


REPORT ON KNOWLEDGE ASSESSMENT CONSTRUCTION AND VALIDATION

FOR NEW CASE MANAGER CERTIFICATION

List of Tables

Table 1. Overall Performance and Recommendations for Item Revision ............. 9

THE UNIVERSITY OF TENNESSEE COLLEGE OF SOCIAL WORK OFFICE OF RESEARCH AND PUBLIC SERVICE APRIL 2007

iv


REPORT ON KNOWLEDGE ASSESSMENT CONSTRUCTION AND VALIDATION

FOR NEW CASE MANAGER CERTIFICATION

List of Figures

Figure 1. Comparison of Item Performance Over Time in Course 4

Post-Assessment ...........................................................................10

Figure 2. Item Characteristic Curve and CTT Item Statistics of a

Good Item According to Both Analyses.......................................15

Figure 3. Item Characteristic Curve and CTT Item Statistics of an

Item that Proved To Be Good by IRT and Poor by CTT..............16

Figure 4. Item Characteristic Curve and CTT Item Statistics of an

Item that Proved To Be Poor by IRT and Moderate by CTT .......17

THE UNIVERSITY OF TENNESSEE COLLEGE OF SOCIAL WORK OFFICE OF RESEARCH AND PUBLIC SERVICE APRIL 2007

v


REPORT ON KNOWLEDGE ASSESSMENT CONSTRUCTION AND VALIDATION

FOR NEW CASE MANAGER CERTIFICATION

Introduction

The Adoption and Safe Families Act (ASFA) of 1997 (Public Law 105-89) was

designed to prevent children in foster care from being returned to unsafe homes

and to find safe homes for children who are not able to return to their families.

Since then, the Tennessee Department of Children’s Services (TDCS), like many

other states, has revamped preservice training for new frontline staff hired to

work with children and families in an effort to better prepare child welfare

workers to fulfill AFSA’s goals of safety, permanence, and well-being. More

recent revisions to preservice training were implemented in the summer and fall

of 2004, as the agency embraced a new best practice model of child welfare 1 and

at the same time tried to address deficiencies in key case manager competencies

previously identified in a statewide needs assessment. Three primary themes

form the foundation of the new outcomes-based preservice training: familycentered

focus, strengths-based approach, and cultural sensitivity.

Four weeks of classroom teaching are combined with four weeks of on-the-job

training (OJT) to allow new workers to build skills identified as critical before

assuming an independent caseload. Throughout classroom training and during

on-the-job training, knowledge and skills assessments are embedded to help

identify worker strengths and continuing needs. A certification requirement for

all newly hired case managers before assignment of a caseload provides an

additional mechanism for accountability in service delivery and illustrates the

department’s commitment to the importance of continued professional

development. The certification requirement includes both a knowledge-based

assessment and a skills assessment. The purpose of this document is to describe

implementation issues related to the knowledge-based assessment requirement in

1 For a detailed description of the best practice model see Tennessee Department of Children’s

Services. (2003). Standards of Professional Practice for Serving Children and Families: A Model

of Practice.

THE UNIVERSITY OF TENNESSEE COLLEGE OF SOCIAL WORK OFFICE OF RESEARCH AND PUBLIC SERVICE APRIL 2007 1


REPORT ON KNOWLEDGE ASSESSMENT CONSTRUCTION AND VALIDATION

FOR NEW CASE MANAGER CERTIFICATION

the newly hired case managers’ certification program. 2 Particular attention is

given to description of the psychometric analysis that is underway at both the

item level and the test level to ensure that the assessment provides a valid

measure of cognitive knowledge domains relevant to the job. 3 Whereas

knowledge alone is not sufficient for quality work with children and families,

there is an implicit assumption that knowledge provides a foundation for the new

worker to draw upon for building skills needed for the job.

Background

Tennessee’s new case manager training is a 9-week program. It uses an

alternating-week structure: one week in the classroom, with workers across

program areas trained conjointly until they reach the last (4 th ) week of classroom

training, and one week in the worker’s actual field setting. An on-the-job training

coach is assigned to each new worker during orientation; the coach assists the

new worker in putting together a professional development team that works

collectively to ensure that an individualized learning plan is in place for the new

worker. The plan is regularly updated based on observations and results of

assessments embedded throughout the training. Supervisors and experienced staff

interested in mentoring new workers serve on professional development teams

along with the new worker, his or her OJT coach, and the classroom trainer.

The first in-class course of preservice training is built on a model of the helping

process that includes core conditions along with engagement and helping skills

around attending, balanced use of questions, empathic reflection of content and

emotions, concretizing, and summarizing. The second in-class course focuses on

gathering information for assessing safety, permanence, well-being, resources

available to support families, and critical thinking skills for analyzing

information gathered. The third in-class course focuses on the development of

individualized case plans that are built upon a family’s identified strengths and

needs along with monitoring and guidelines for updates of case plans. Finally, the

fourth in-class week focuses on the new worker’s program area with Child

Protective Services workers in one track and Permanence workers in a second

track. At the end of each in-class course there is a knowledge assessment that

consists of 20 multiple choice items.

The post-course knowledge assessment is designed to enable the trainer, the new

worker (trainee), and the new worker’s professional development team to track

2 Development and validation issues related to the skills assessment are discussed in a separate

document.

3 Additional information related to validation efforts for the knowledge assessment that involve

primarily qualitative input gathered from ongoing monthly meetings with a panel of field experts

can be furnished upon request.

THE UNIVERSITY OF TENNESSEE COLLEGE OF SOCIAL WORK OFFICE OF RESEARCH AND PUBLIC SERVICE APRIL 2007 2


REPORT ON KNOWLEDGE ASSESSMENT CONSTRUCTION AND VALIDATION

FOR NEW CASE MANAGER CERTIFICATION

the trainee’s progress. It is intended to do so in a fashion that parallels the

philosophy of practice embodied in the new worker certification training.

Assessment is viewed as developmental, seeking to encourage continuous

enhancement of knowledge. The end of course knowledge assessment also

allows the new worker to familiarize himself or herself with the types of

questions included in the final knowledge assessment required for certification

before assuming an independent caseload. Items for both assessments are drawn

from the same item bank. As “test anxiety” increases with high-stakes

assessments, the hope is to alleviate some of the worker’s stress by building

familiarity with the knowledge assessment process throughout preservice

training. The end of course knowledge assessments consist of 20 items, and the

end of preservice final assessment consists of 120 items (30 items corresponding

to each of the 4 classroom weeks).

Development of Knowledge Assessment

Objectives and Specifications/Blueprint

In the development of any knowledge-based assessment, it is essential to specify

as clearly as possible the domain of content or behaviors that define the

objectives measured by the instrument. With a certification or licensure exam, it

is common practice to conduct a “role delineation study” or “task analysis” first,

with individuals working in the field identifying the responsibilities, subresponsibilities,

and activities that define each role/task. Next, the knowledge and

skills needed to carry out each task are identified, and later a panel of experts

validates the list of desirable knowledge and skills. 4 The validated or approved

list of knowledge and skills comprise the specific objectives that need to be

measured.

In the initial stage of development, evaluators from the University of Tennessee

College of Social Work Office of Research and Public Service (SWORPS)

identified a list of key competencies in consultation with members of a Technical

Assistance Committee (TAC) and the Curriculum Development Team. SWORPS

evaluators conducted a task analysis session with experienced TDCS case

managers who worked in an urban setting. 5 The purpose of the session was to

begin the process of validating the key competencies initially identified during

the session. Case managers were invited to comment on the accuracy of the listed

4 To make development more efficient, desirable knowledge and skills were identified during the

task analysis process described here. The results of the analysis were also utilized in the

development of the skills assessment (to avoid reconvening groups of workers).

5 A similar session was not repeated in a rural area. It is assumed here that even though time

allocated to various daily tasks may vary between rural and urban frontline workers, desired key

competencies for good casework are similar in the two settings.

THE UNIVERSITY OF TENNESSEE COLLEGE OF SOCIAL WORK OFFICE OF RESEARCH AND PUBLIC SERVICE APRIL 2007 3


REPORT ON KNOWLEDGE ASSESSMENT CONSTRUCTION AND VALIDATION

FOR NEW CASE MANAGER CERTIFICATION

competencies. During the latter part of the discussion, participants were asked to

articulate specific responsibilities and activities/tasks related to each competency.

SWORPS evaluators then facilitated a task analysis session with TDCS frontline

supervisors in order to continue the process of validating the key competencies

initially identified. Information from the case managers’ task analysis session

was shared with TDCS frontline supervisors. Supervisors acted as a panel of

experts, providing additional insights about what constitutes key competencies

for case managers who serve children and families. In turn, SWORPS evaluators

drafted a blueprint for the new worker knowledge assessment based on the results

of the task analysis. Additional steps to refine this initial draft included the

review of the learning objectives as outlined in each unit of the preservice

curriculum and attending pilot sessions in the classroom to help evaluators gain a

sense of the emphasis placed on various concepts. The revised blueprint for the

assessment was submitted to the Technical Advisory Committee for approval

before further test development steps were taken.

Item Writing and Item Editing

Using preservice training material, SWORPS evaluators developed a large pool

of items that was utilized for piloting the knowledge assessment process. More

specifically, the number of items in the initial pool used for piloting was

approximately three times greater than the number of items needed for a single

final knowledge assessment administration (i.e., for a 120-item test, the initial

item bank consisted of approximately 360 items). 6 The distribution of pool items

across various content areas was based on the importance assigned to various

learning outcomes in the test blueprint.

Colleagues familiar with the revised preservice curriculum reviewed the pool of

items developed by individual evaluators. Special attention was given to the

following issues during the review process:

1. Does each item measure an important learning outcome included in the

blueprint specifications?

2. Does each item present a clearly formulated problem?

3. Is the item stated in simple, clear, non-biased language? Is terminology

consistent with preservice coursework and policy?

4. Is the item free from extraneous clues leading to the correct answer?

5. Is the difficulty of the item appropriate?

6 In the Standards for Educational and Psychological Testing (1999) it is recommended that at least

three to four times the number of test items required to construct an examination be developed.

THE UNIVERSITY OF TENNESSEE COLLEGE OF SOCIAL WORK OFFICE OF RESEARCH AND PUBLIC SERVICE APRIL 2007 4


REPORT ON KNOWLEDGE ASSESSMENT CONSTRUCTION AND VALIDATION

FOR NEW CASE MANAGER CERTIFICATION

6. Do the items included in the assessment provide adequate coverage of

the blueprint specifications?

Once the initial editing of pool items was completed, items were submitted to

preservice curriculum developers in an effort to identify additional weaknesses.

The review was based on the same guidelines enumerated above. SWORPS

evaluators incorporated feedback and made revisions as needed. Items were

stored in an item bank database with sub-pools created for each preservice

classroom course. Also stored were links for each item to specific competencies

and unit objectives in the preservice curriculum.

Field Administration

SWORPS evaluators developed a document with directions designed to

communicate the following information to all trainees:

1. Purpose of the end of course assessment;

2. Time allowed to complete the end of course assessment;

3. How to record answers; and

4. Whether to guess when in doubt about the answer.

Directions were included in trainees’ knowledge assessment booklets to ensure

consistency across settings along with scripted verbal guidelines that trainers

were asked to share in class prior to administering the assessment.

Over time, multiple versions of the knowledge assessment have been developed

for use at the end of the classroom courses. The remainder of this document

describes how data gathered from early piloting of items have been used to

continuously improve the quality of individual items in the item bank and the

overall quality of the instruments used at the end of each course.

THE UNIVERSITY OF TENNESSEE COLLEGE OF SOCIAL WORK OFFICE OF RESEARCH AND PUBLIC SERVICE APRIL 2007 5


REPORT ON KNOWLEDGE ASSESSMENT CONSTRUCTION AND VALIDATION

FOR NEW CASE MANAGER CERTIFICATION

Item Analysis

Psychometrics is a field of study concerned with the theory and technique of

psychological measurement, which includes the measurement of knowledge,

abilities, attitudes, and personality traits. There are two well known

methodologies in psychometric study used to achieve the goal of item validation:

♦ Classical test theory (CTT) uses traditional sample-dependent statistics.

These include, but are not limited to, item difficulty and item

discrimination indices, item-test intercorrelations, and distractor analysis.

CTT analysis employs relatively simple mathematical techniques, thus

explaining its wide use and popularity in test validation.

♦ Item Response Theory (IRT), also known as modern test theory, is the

study of test and item scores based on assumptions concerning the

mathematical relationship between ability and item responses. IRT

models are known as “strong” models since the assumptions are harder

to meet. IRT is the gold standard for item validation, mainly because of

its property of group invariance, or the fact that the calculated item

parameters are independent of the ability level of examinees responding

to the item.

THE UNIVERSITY OF TENNESSEE COLLEGE OF SOCIAL WORK OFFICE OF RESEARCH AND PUBLIC SERVICE APRIL 2007 6


REPORT ON KNOWLEDGE ASSESSMENT CONSTRUCTION AND VALIDATION

FOR NEW CASE MANAGER CERTIFICATION

Methodology

Although IRT is the preferred method for item validation, it is necessary to rely

on CTT in early steps of the analysis. IRT requires large sample sizes that were

not available at the beginning of the program implementation. Therefore, CTT

was the only alternative given the knowledge instruments with small sample

sizes. As the number of examinees taking the same end of class course increased

in some cases over time, evaluators utilized IRT models to build additional

support for validity of designed measures.

Classical Test Theory

The goal in item analysis is to construct an assessment that has the necessary

degree of reliability and validity. To achieve this, a large number of items were

analyzed for their psychometric properties. In developing the assessment, CTT

was used to make judgments about item quality. The goal is to recognize those

items that were functioning well with good qualities and those items that were

not. Items that had poor item characteristics were revised or eliminated. Classical

test theory procedures involve selecting items using the item difficulty index,

corrected point biserial coefficient, index of discrimination, and distractor

analysis.

♦ Item difficulty index is the proportion of examinees who answered the

item correctly; lower percentages reflect higher item difficulty. In

general, for an item to discriminate well between examinees, the item

difficulty should not be too high or too low. Extremely low values may

indicate that the question is too difficult, written poorly, or has problems

with item content. Questions with a high item difficulty index are

avoided, as they may be too easy and not measure knowledge

acquisition.

THE UNIVERSITY OF TENNESSEE COLLEGE OF SOCIAL WORK OFFICE OF RESEARCH AND PUBLIC SERVICE APRIL 2007 7


REPORT ON KNOWLEDGE ASSESSMENT CONSTRUCTION AND VALIDATION

FOR NEW CASE MANAGER CERTIFICATION

♦ Point biserial correlation computes the correlation between item

response and total test score (r pbi ). The higher the value of r pbi , the

stronger the relationship between the item response and total score. Point

biserial can range from –1.00 to 1.00, similar to Pearson’s productmoment

correlation. It is sometimes more valuable to compute the

correlation excluding the particular test item from the overall test score.

This statistic is called corrected point biserial correlation. There are

other correlation coefficients that can be computed: biserial, phi, and

tetrachoric. “Possibly the use of point biserial might tend to produce a

more reliable test for groups exactly like the pretest group, whereas

biserial might work better for subsequent groups of examinees that differ

somewhat from the pretest group” (Lord & Novick, 1968, p. 344). In this

study, it is assumed that future worker groups will be similar in ability to

the original sample group.

♦ Index of Discrimination. Overall assessment scores are dichotomized

into high and low scorers in order to compute a stable item

discrimination index. The upper 27% of the examinee group and lower

27% of the examinee group is constructed. The difference between the

percentage of examinees in the top 27% who correctly answered the test

item and the percentage in the bottom 27% who correctly answered the

item is called the discrimination index. One would hope to see as an end

result that high scorers selected the correct answer, while a large

proportion of low scorers decided to choose one of the distractors.

♦ Distractor Analysis. The purpose of distractor analysis is to polish up the

items. This process involves looking at frequencies for each response to

an item. Distractors that are not chosen by any examinees should be

revised or eliminated. The group in the bottom 27% should select

incorrect options in greater proportion than the upper 27%. Items created

in such a way that a single distractor is selected more often than others or

more often than the correct answer should be revised.

Parameters that are suggested in psychometric literature are used. These “rules of

thumb” can be adjusted up or down, according to circumstances. For this study, a

good item is identified by an item difficulty index of 0.20 to 0.80, a corrected

point biserial correlation coefficient ≥ 0.09, and discrimination index ≥ 0.30. To

satisfy each of the three item statistics, an item receives one point in overall

performance. Thus, for every item performance, the points vary from 0 (none of

the three statistics meets threshold) to 3 (all of the three statistics meet threshold).

As can be seen in Table 1, a recommendation based on overall performance

follows each item. These recommendations are taken into consideration as efforts

are made to improve individual item performance and overall exam performance.

THE UNIVERSITY OF TENNESSEE COLLEGE OF SOCIAL WORK OFFICE OF RESEARCH AND PUBLIC SERVICE APRIL 2007 8


REPORT ON KNOWLEDGE ASSESSMENT CONSTRUCTION AND VALIDATION

FOR NEW CASE MANAGER CERTIFICATION

Table 1. Overall Performance and Recommendations for Item Revision

Overall Performance

Recommendation

3 Performed well Retain without revision

2

0 or 1

Performed

moderately

Performed poorly

Based on one poor item statistic, most of the

time needs little revision

Based on poor item statistics, discard or

revise the item

Initial Analysis

An assessment was developed to provide feedback to the participant at the end of

Course 4 (Conducting Family-Centered Assessments) and is selected to illustrate

how item analysis was used to assess and improve overall performance of items.

The process was similar for other instruments.

Four different versions of the Course 4 assessment were developed over time.

Different versions are produced for many reasons. Early on, results from a first

version can be used for strengthening the quality of the items. At other times, a

new version may be needed because of changes to the classroom curriculum.

Finally, multiple versions are needed as items become “overexposed,” and

participants may answer correctly, as a person who went through the training

before shares a correct answer.

The first version of the Course 4 post-assessment was used with 122 examinees.

Version 2 was used only with 16 examinees as the curriculum changed based on

field needs, and evaluators had to adjust the content of the post-assessment.

Versions 3 and 4 were used with 246 and 453 examinees, respectively. Figure 1

presents the item improvement from Version 1 to Version 4 (Version 2 results

are not included due to a small number of examinees). As indicated in Figure 1,

there is an increase in the proportion of items performing moderately well and

well over the three versions. Conversely, there is a decrease in the proportion of

items performing poorly. The overall conclusion is that item performance

successfully improved with each version.

THE UNIVERSITY OF TENNESSEE COLLEGE OF SOCIAL WORK OFFICE OF RESEARCH AND PUBLIC SERVICE APRIL 2007 9


REPORT ON KNOWLEDGE ASSESSMENT CONSTRUCTION AND VALIDATION

FOR NEW CASE MANAGER CERTIFICATION

Figure 1. Comparison of Item Performance Over Time in Course 4 Post-Assessment

100

80

60

40

55% 45%

70%

30%

75%

25%

moderately

well or well

poor

20

0

Version 1 Version 3 Version 4

A more detailed item analysis shows that Version 1 had 7 items (35%) that

performed well (overall performance = 3), 4 items (20%) with moderate

performance (overall performance = 2) and 9 items (45%) were classified with

poor performance (overall performance = 0 or 1).

Version 3 had 12 items (60%) that performed well, 2 items (10%) that performed

moderately, and 6 items (30%) that performed poorly.

Version 4 had 11 items (55%) that were judged as having performed well. Four

items (20%) performed moderately, and 5 items (25%) had a poor performance.

Appendix A presents a summary of item statistics for Course 4, Version 4. For

each item, the difficulty index, the corrected point biserial coefficient, the

discrimination index, and an overall performance rating is reported. Based on the

overall performance rating of each item, recommendations for retention or

revision were given.

Appendix B demonstrates a comparison of end of Course 4 item overall

performance from Version 1 to Version 4. From Version 1 to Version 4, 15 items

were changed (75%). Within the group of changed items, 7 items improved their

overall performance and 4 performed well on both occasions; 2 items dropped

their performance and 2 items performed poorly on both occasions. The amount

of change varied. Slightly changed items had distractors added, changed, or

dropped. In contrast, for drastically changed items the format of the question was

altered. Sometimes the item assessed knowledge related to the same topic or

possibly a totally new item was created.

Appendix C displays information related to the performance of 3 items over time

with recommendations. The first item changed step by step from an overall

THE UNIVERSITY OF TENNESSEE COLLEGE OF SOCIAL WORK OFFICE OF RESEARCH AND PUBLIC SERVICE APRIL 2007 10


REPORT ON KNOWLEDGE ASSESSMENT CONSTRUCTION AND VALIDATION

FOR NEW CASE MANAGER CERTIFICATION

performance of 0 to 2, but still needs revision because the discrimination index is

low. The second item had good difficulty and a good discrimination index.

Revision was suggested because the corrected point biserial correlation was low.

After revision, performance remained at level three. The third item stayed the

same in all three versions. This item holds at a high performance level in all

versions. One disadvantage of exposing a good item too often is that the

difficulty index gradually gets larger as more examinees become familiar with

the item and are more likely to pick the correct answer.

THE UNIVERSITY OF TENNESSEE COLLEGE OF SOCIAL WORK OFFICE OF RESEARCH AND PUBLIC SERVICE APRIL 2007 11


REPORT ON KNOWLEDGE ASSESSMENT CONSTRUCTION AND VALIDATION

FOR NEW CASE MANAGER CERTIFICATION

Number of Items Assessed

By fall 2006, data have been analyzed for 362 items in one or more versions of

an assessment. Given the relatively small sample size available from preservice

examinees, it was necessary to rely also on data gathered from experienced staff.

Still, not all instruments were suitable for item analysis because some versions

were used with only a small number of examinees. This is especially true for the

end of Course 8 and some versions of the final knowledge assessments.

Using classical test theory techniques, 82 items (23%) have proven to be stable,

consistently performing well across multiple tests on all three parameters

(difficulty index, corrected point biserial correlation, and discrimination index).

A small number of stable items (18) have been revised because they did not meet

the additional requirement of measuring a specific learning objective well when

reviewed by a panel of content experts (trainers, OJT coaches, and TDCS

supervisors).

In addition, there were 103 items (28%) for which initial results are promising,

but items have not been included in enough versions to assess consistency of

good performance. More specifically, 59 items performed well or moderately

well on only one instrument, and SWORPS evaluators did not have enough data

to assess across instruments to know if the instruments perform consistently well.

Forty-four items have an acceptable difficulty index but a small number of

responses (under 150). Those 44 items will stay as is until the sample size

increases with future administrations, since the other two parameters are very

sensitive to the sample size.

These results represent the data analyzed through fall 2006. Since then, new

instruments have been created and some have been archived. The number of

active items (185) in the pool has increased as of the date of publication of this

report.

THE UNIVERSITY OF TENNESSEE COLLEGE OF SOCIAL WORK OFFICE OF RESEARCH AND PUBLIC SERVICE APRIL 2007 12


REPORT ON KNOWLEDGE ASSESSMENT CONSTRUCTION AND VALIDATION

FOR NEW CASE MANAGER CERTIFICATION

Future Directions

Item Response Theory (IRT)

While classical test theory analysis is useful when constructing and evaluating

knowledge assessment instruments, it has several limitations. The item statistics

are sample dependent, and it is difficult to compare examinees’ results between

different assessments. To avoid this limitation with CTT, more and more

psychometricians are using IRT. One disadvantage of the IRT approach is that

models require large samples for stable parameter estimates. There is rich

literature on both methods. Sample size requirements for estimating item

parameters are not entirely clear. Tsutakawa and Johnson (1990) recommend a

sample size of n=500; however, other sources have suggested that n=300 is

sufficient (Chuah, S.C., Drasgow, F., & Luecht, R.M., 2006). For the three

parameters, a logistic model as large as n=1,000 is recommended.

For analyzing the certification exam, 12 instrument versions met the minimum

required sample size for utilizing IRT. At present, four instrument versions from

Course 2 have been analyzed using IRT. Item parameters, item characteristic

curves (ICC), item information curves, and model fit were estimated with

BILOG 3.0 using the two-parameter logistic model (2PL).

The purpose of this analysis is to compare the item statistics from the CTT with

the item parameters from the IRT to confirm whether results from IRT are

consistent with CTT, the approach that was used the most.

The Course 2 assessment, Version 3, will be used to illustrate the findings. Based

on item characteristic curves, 18 items from a possible 20 were classified in the

same direction as that which CTT recommended.

THE UNIVERSITY OF TENNESSEE COLLEGE OF SOCIAL WORK OFFICE OF RESEARCH AND PUBLIC SERVICE APRIL 2007 13


REPORT ON KNOWLEDGE ASSESSMENT CONSTRUCTION AND VALIDATION

FOR NEW CASE MANAGER CERTIFICATION

In a 2PL model used for analysis, the location parameter (b) represents the

difficulty of the item, with higher location values representing more difficult

items (see Figures 2–4). The slope parameter (a) represents the discrimination

power of the item, or how well the item differentiates between examinees of

higher and lower ability. Items with higher slope values discriminate better

between high and low ability examinees (see Figures 2–4). For this study, a good

item is identified by an item that has a large slope parameter and location

parameter near 0.0. 7

Three items were selected to illustrate agreement between CTT and IRT—one

item to present how the results from the two methods correspond and two items

to present how they differ.

In Figures 2 through 4, the item characteristic curves of selected items are shown

with a table of CTT item statistics and overall performance.

7 The guidelines for item selection are adopted from the IRT Modeling Lab, Website:

http://work.psych.uiuc.edu/irc.

THE UNIVERSITY OF TENNESSEE COLLEGE OF SOCIAL WORK OFFICE OF RESEARCH AND PUBLIC SERVICE APRIL 2007 14


REPORT ON KNOWLEDGE ASSESSMENT CONSTRUCTION AND VALIDATION

FOR NEW CASE MANAGER CERTIFICATION

Figure 2. Item Characteristic Curve and CTT Item Statistics of a Good Item

According to Both Analyses (n=526)

1.0

Item Characteristic Curve: CULTURE_15V1

a = 0.593 b = 0.105

0.8

Probability

0.6

0.4

0.2

b

0

-3 -2 -1 0 1 2 3

Ability

CTT Item Statistics

Difficulty Index: 0.48

Corrected Point Biserial Coefficient 0.26

Discrimination Index: 0.64

Overall Performance: 3

THE UNIVERSITY OF TENNESSEE COLLEGE OF SOCIAL WORK OFFICE OF RESEARCH AND PUBLIC SERVICE APRIL 2007 15


REPORT ON KNOWLEDGE ASSESSMENT CONSTRUCTION AND VALIDATION

FOR NEW CASE MANAGER CERTIFICATION

Figure 3. Item Characteristic Curve and CTT Item Statistics of an Item that Proved

To Be Good by IRT and Poor by CTT (n=526)

1.0

Item Characteristic Curve:STRENGTHS_01V2

a = 0.676 b = -2.104

0.8

Probability

0.6

0.4

0.2

b

0

-3 -2 -1 0 1 2 3

Ability

CTT Item Statistics

Difficulty Index: 0.89

Corrected Point Biserial Coefficient: 0.23

Discrimination Index: 0.27

Overall Performance: 1

THE UNIVERSITY OF TENNESSEE COLLEGE OF SOCIAL WORK OFFICE OF RESEARCH AND PUBLIC SERVICE APRIL 2007 16


REPORT ON KNOWLEDGE ASSESSMENT CONSTRUCTION AND VALIDATION

FOR NEW CASE MANAGER CERTIFICATION

Figure 4. Item Characteristic Curve and CTT Item Statistics of an Item that Proved

To Be Poor by IRT and Moderate by CTT (n=526)

1.0

Item Characteristic Curve: STRENGTHS_BASED_01V2

a = 0.253 b = -2.456

0.8

Probability

0.6

0.4

0.2

b

0

-3 -2 -1 0 1 2 3

Ability

CTT Item Statistics

Difficulty Index 0.80

Corrected Point Biserial Coefficient 0.09

Discrimination Index: 0.24

Overall Performance: 2

THE UNIVERSITY OF TENNESSEE COLLEGE OF SOCIAL WORK OFFICE OF RESEARCH AND PUBLIC SERVICE APRIL 2007 17


REPORT ON KNOWLEDGE ASSESSMENT CONSTRUCTION AND VALIDATION

FOR NEW CASE MANAGER CERTIFICATION

Conclusion

The overall conclusion of the analysis is that the agreement between results from

the item analysis within two different frameworks, CTT and IRT, was reasonably

good. Analysis using IRT techniques has just begun. Clearly, much work still

needs to be done.

Comparing Classical Test Theory and Item

Response Theory

It is important to continue to use classical test theory because these statistics

produce valuable information about the items. In fact, it is good practice to run a

CTT item analysis before estimating item parameters to remove items with zero

or negative corrected point biserial correlations. In recent years, a larger number

of psychometricians have analyzed their data with IRT. There are important

questions to be considered for future study. For example, how do statistics from

CTT compare with item parameters estimated by IRT? It is essential to

investigate whether the chosen 2PL model fits the data properly in order to

improve the item analysis and test design. Conclusions from IRT analysis should

correspond to those from CTT when the correct model is applied. After

determining a correct model, IRT can be used with confidence from one group of

examinees to another. Conducting classical analysis will remain an essential part

of the analysis.

Exploring Exam Bias

To supplement the analysis of individual items, a logical next step is to examine

possible test bias by looking at examinee demographics in relation to assessment

scores. Some factors that could be examined are gender, race, area of study, and

THE UNIVERSITY OF TENNESSEE COLLEGE OF SOCIAL WORK OFFICE OF RESEARCH AND PUBLIC SERVICE APRIL 2007 18


REPORT ON KNOWLEDGE ASSESSMENT CONSTRUCTION AND VALIDATION

FOR NEW CASE MANAGER CERTIFICATION

examinee region of employment. The purpose of bias analysis is to confirm that

no one group of examinees has an unfair advantage over the rest.

THE UNIVERSITY OF TENNESSEE COLLEGE OF SOCIAL WORK OFFICE OF RESEARCH AND PUBLIC SERVICE APRIL 2007 19


REPORT ON KNOWLEDGE ASSESSMENT CONSTRUCTION AND VALIDATION

FOR NEW CASE MANAGER CERTIFICATION

References

Baker F. (2001). The basics of item response theory. ERIC Clearinghouse on

Assessment and Evaluation.

BILOG-MG, MULTILOG, PARASCALE, and TESTFACT, Lincolnwood. IL:

Scientific Software International.

Chuah, S.C., Drasgow, F., & Luecht, R. M. (2006). How big is big enough?

Sample Size Requirements for CAST Item Parameter Estimation. Applied

Measurement in Education, 19(3), 241–251.

Lord, F.M., & Novick, M.R. (1968). Statistical theories of mental test scores.

Reading: Addison-Wesley

Stage, C. (2003). Classical test theory or item response theory: The Swedish

experience. (No.42). Umea: Department of Educational Measurement.

The American Educational Research Association, The American Psychological

Association, The National Council on Measurement in Education. (1999).

Standards for educational and psychological testing. Washington, D.C.:

American Educational Research Association.

Tsutakawa, R.K., & Johnson, J.C. (1990). The effect of uncertainty of item

parameter estimation on ability estimates. Psychometrika, 55, 371–390.

THE UNIVERSITY OF TENNESSEE COLLEGE OF SOCIAL WORK OFFICE OF RESEARCH AND PUBLIC SERVICE APRIL 2007 20


REPORT ON KNOWLEDGE ASSESSMENT CONSTRUCTION AND VALIDATION

FOR NEW CASE MANAGER CERTIFICATION

Appendix A

Item Analysis Summary

THE UNIVERSITY OF TENNESSEE COLLEGE OF SOCIAL WORK OFFICE OF RESEARCH AND PUBLIC SERVICE JANUARY 2007 21


Item Analysis Summary

End of Course 4

Version 4 ( n=453)

Item Short Name Difficulty Index Corrected

Point Biserial

Coefficient

Discrimination

Index

Overall

Performance

1 SEPT_14V2 0.78 0.12 0.34 3

2 LIFET_24V1 0.73 0.19 0.44 3

3 SAFETY_02V1 0.70 0.19 0.43 3

4 ASSESS_20V2 0.58 0.15 0.17 2

5 TOOLS_03V2 0.79 0.09 0.26 2

6 Funct_assess_01V2 0.58 0.25 0.60 3

7 STAGES_03V2 0.64 0.32 0.56 3

8 STAGESDEV_01V1 0.48 0.02 0.29 1

9 Tools_04V2 0.86 0.13 0.23 1

10 Analyzing_info_01V2 0.85 0.33 0.41 2

11 CHILDDEV_01V1 0.78 0.07 0.21 1

12 PROTA_26V1 0.71 0.22 0.51 3

13 TOOLS_05V2 0.82 0.07 0.25 0

14 NEWFAM__363V1 0.80 0.21 0.38 3

15 ASSESS_19V3 0.94 0.16 0.15 1

16 ATTACH_02V3 0.74 0.20 0.43 3

17 Domestic_violence_03V1 0.49 0.15 0.43 3

18 Underlying_needs_03V1 0.46 0.04 0.36 2

19 Signs_of_safety_01V1 0.75 0.31 0.52 3

20 STAGES_02V2 0.74 0.19 0.42 3


REPORT ON KNOWLEDGE ASSESSMENT CONSTRUCTION AND VALIDATION

FOR NEW CASE MANAGER CERTIFICATION

Appendix B

Comparison of Item Performance in End of

Course 4, Version 1 and Version 4

THE UNIVERSITY OF TENNESSEE COLLEGE OF SOCIAL WORK OFFICE OF RESEARCH AND PUBLIC SERVICE JANUARY 2007 23


REPORT ON KNOWLEDGE ASSESSMENT CONSTRUCTION AND VALIDATION

FOR NEW CASE MANAGER CERTIFICATION

Comparison of Item Performance

in End of Course 4 Version 1 and Version 4

Item

Change in

Performance

Performance

Change in

Version

Version 1

Version 4

Performance

1 SEPT_14V2

Slightly Performed Poorly Performed Well


2 LIFET_24V1

No Change

Performed

Moderately

Performed Well


3 SAFETY_02V1

4 ASSESS_20V2

5 TOOLS_03V2

6 Funct_assess_01V2

No Change Performed Well Performed Well

Drastically Performed Poorly Performed

Moderately

Slightly Performed Poorly Performed

Moderately

Drastically Performed Well Performed Well



7 STAGES_03V2

Slightly

Performed

Moderately

Performed Well


8 STAGESDEV_01V1

9 Tools_04V2

10 Analyzing_info_01V2

11 CHILDDEV_01V1

12 PROTA_26V1

13 TOOLS_05V2

14 NEWFAM__363V1

15 ASSESS_19V3

16 ATTACH_02V3

17 Domestic_violence_03V1

18 Underlying_needs_03V1

19 Signs_of_safety_01V1

20 STAGES_02V2

No Change Performed Poorly Performed Poorly

Drastically Performed Poorly Performed Poorly

Drastically Performed Poorly Performed

Moderately

Drastically Performed Well Performed Poorly

No Change Performed Poorly Performed Well

Drastically Performed Performed Poorly

No Change Performed Well Performed Well

Drastically Performed Poorly Performed Poorly

Slightly Performed Well Performed Well

Drastically Performed Well Performed Well

Drastically Performed Poorly Performed

Moderately

Drastically Performed Performed Well

Slightly Performed Well Performed Well







*The symbols and indicate change in overall performance; the direction of the

arrow indicates that either the overall performance increased or decreased from version to

version.

THE UNIVERSITY OF TENNESSEE COLLEGE OF SOCIAL WORK OFFICE OF RESEARCH AND PUBLIC SERVICE JANUARY 2007 24


REPORT ON KNOWLEDGE ASSESSMENT CONSTRUCTION AND VALIDATION

FOR NEW CASE MANAGER CERTIFICATION

Appendix C

Comparison of Item Performance in End of

Course 4 Version 1, Version 3 and Version 4 for

Selected Items

THE UNIVERSITY OF TENNESSEE COLLEGE OF SOCIAL WORK OFFICE OF RESEARCH AND PUBLIC SERVICE JANUARY 2007 25


Comparison of Item Performance in End of Course 4 Version 1, Version 3 and Version 4 for Selected Items

Version

Item Short Name

Difficulty

Index

Corrected

Point Biserial

Coefficient

Discrimination

Index

Overall

Performance

Recommendations

V1

ASSESS_04V1

0.97 0.06 0.00 0

Item has low corrected point biserial correlation and

discrimination index. Difficulty index is very high. Items with

low corrected point biserial correlation should be eliminated

or substantially revised. Very low discrimination index

supports this decision.

V3

ASSESS_20V1

0.83 0.16 0.25 1

Item has a difficulty level (p=0.83) that is only slightly greater

than recommended. Discrimination index is D=0.25.

According to guidelines this item is marginal and needs

revision.

V4

ASSESS_20V2

0.58 0.15 0.17 2

Item has a good difficulty index and good corrected point

biserial correlation but the discrimination index is

recommending revision.

V1

STAGES_03V1

0.55 0.05 0.40 2

Item has a good discrimination and difficulty index, but

corrected point biserial correlation is low. Item needs

revision.

V3

STAGES_03V2

0.63 0.15 0.48 3 Retain without revision.

V4

STAGES_03V2

0.64 0.32 0.56 3 Retain without revision.

V1 SAFETY_02V1 0.56 0.29 0.63 3

V3 SAFETY_02V1 0.61 0.14 0.44 3

Retain without revision. Difficulty index is increasing. Do not

use item in subsequent versions.

V4 SAFETY_02V1 0.70 0.19 0.43 3

More magazines by this user
Similar magazines