Cut-off Scores of the New Chinese Proficiency Test Based on ... - ALTE

<strong>Cut</strong>-<strong><strong>of</strong>f</strong> <strong>Scores</strong> <strong>of</strong> <strong>the</strong> <strong>New</strong> 

<strong>Chinese</strong> Pr<strong>of</strong>iciency <strong>Test</strong> 

<strong>Based</strong> on <strong>the</strong> Ang<strong><strong>of</strong>f</strong> Standard 

Setting Method 

Steering Committee for <strong>the</strong> <strong>Test</strong> Of Pr<strong>of</strong>iciency-Huayu 

Lin, Ling-Ying 

Lan, Pei-Jiun

<strong>Cut</strong>-<strong><strong>of</strong>f</strong> <strong>Scores</strong> <strong>of</strong> <strong>the</strong> <strong>New</strong> 

<strong>Chinese</strong> Pr<strong>of</strong>iciency <strong>Test</strong> 

<strong>Based</strong> on <strong>the</strong> Ang<strong><strong>of</strong>f</strong> Standard 

Setting Method 

Steering Committee for <strong>the</strong> <strong>Test</strong> Of Pr<strong>of</strong>iciency-Huayu 

Lin, Ling-Ying 

Lan, Pei-Jiun

Background 

 

 

Renamed: 

old: <strong>Test</strong> Of Pr<strong>of</strong>iciency-Huayu (TOP) 

new: <strong>Test</strong> <strong>of</strong> <strong>Chinese</strong> as a Foreign Language 

(TOCFL) 

Revised: 

1. based on CEFR 

2. 120 items 100 items 

3. new item types

Purpose 

To identify <strong>the</strong> cut-<strong><strong>of</strong>f</strong> scores on three levels 

<strong>of</strong> TOCFL based on standard setting method.

Literature Review 

Introduction to TOCFL 

Short 

Dialogue 

(1 turn) 

Listening Comprehension 

(50 items) 

Short 

Dialogue 

(twoturn) 

Long 

Dialogue 

(multiple 

- turn) 

Monologue 

Reading Comprehension 

(50 items) 

Cloze 

Au<strong>the</strong>ntic 

Material 

Short 

Essay 

B1 20 15 -- 15 20 15 15 

B2 10 10 15 15 15 10 25 

C1 -- 10 20 20 15 -- 35


definition <strong>of</strong> standard setting 

Cizek (1993) 

“standard setting is <strong>the</strong> proper following <strong>of</strong> a 

prescribed, rational system <strong>of</strong> rules or 

procedures resulting in <strong>the</strong> assignment <strong>of</strong> a 

number to differentiate between two or 

more states or degrees <strong>of</strong> performance.”



Kane (1994) 

“It is useful to draw a distinction between <strong>the</strong> 

passing score, defined as a point on <strong>the</strong> score 

scale, and <strong>the</strong> performance standard, defined 

as <strong>the</strong> minimally adequate level <strong>of</strong> performance 

for some purpose…The performance standard 

is <strong>the</strong> conceptual version <strong>of</strong> <strong>the</strong> desired level <strong>of</strong> 

competence, and <strong>the</strong> passing score is <strong>the</strong> 

operational version.”



Tannenbaum and Wylie(2008) 

“Standard setting is a general label for a 

number <strong>of</strong> approaches commonly used to 

identify test scores that support decisions 

about test takers’ (candidates’) level <strong>of</strong> 

knowledge, skill, pr<strong>of</strong>iciency, mastery, or 

readiness.”



Cizek and Bunch(2007) 

“To some degree, <strong>the</strong>n, because standard 

setting necessarily involves human opinions 

and values, it can also be viewed as a nexus 

<strong>of</strong> technical, psychometric methods and 

policy making.”


<strong>the</strong> Ang<strong><strong>of</strong>f</strong> method 

minimally 

acceptable 

person 

probability 

statements 

aggregating 

individual 

standards



minimally 

acceptable 


probability 

statements 

aggregating 

individual 

standards 

minimally 

acceptable person



minimally 

acceptable 


probability 

statements 

aggregating 

individual 

standards



minimally acceptable 


15% 25% 35% 

hard 

45% 55% 65% 

medium 

75% 85% 95% 

easy



minimally 

acceptable 


probability 

statements 

aggregating 

individual 

standards



item1 item2 item3 …. item50 SUM 

J1 0.75 0.65 0.65 0.55 31.7 

J2 0.85 0.75 0.65 0.45 32.1 

. 

. 

. 

J11 0.75 0.75 0.65 0.45 31.9 

average

Method 

Panelists 

Number 

Percentage 

Gender 

Background 

Female 11 100.0% 

Male 0 0.0% 

<strong>Chinese</strong> 

Teaching 

5 45.5% 

Linguistics 4 36.4% 

Psycometrics 2 18.2%

Method: procedure 

Panelist familiarization 

Definition <strong>of</strong> minimally acceptable person 

Two rounds <strong>of</strong> probability judgments 

Analysis <strong>of</strong> internal validity












round1 judgment 

Item P value 

feedback & discussion 

IRT difficulty 

parameter 

round2 judgment






<strong>Cut</strong>-<strong><strong>of</strong>f</strong> scores 

Result 

<strong>Test</strong> Level 

mean <strong>of</strong> 1 st 

round (SD) 

mean <strong>of</strong> 2 nd 

round (SD) 

<strong>Cut</strong>-<strong><strong>of</strong>f</strong> 

scores 

B1 

34.51 

(2.862) 

33.89 

(1.778) 

34 

Reading 

Comprehension 

B2 

32.17 

(1.616) 

32.10 

(1.409) 

32 

C1 

29.96 

(1.905) 

30.06 

(1.686) 

30 

B1 

32.18 

(1.632) 

32.31 

(1.129) 

32 

Listening 


B2 

31.56 

(1.697) 

31.84 

(1.498) 

32 

C1 

32.55 

(0.983) 

32.45 

(0.933) 

32

Result 

 

Correlation analysis between <strong>the</strong> estimated results 

and IRT difficulty parameter <strong>of</strong> test items 


Number <strong>of</strong> 

test items 

1 st round 2 nd round 

Reading 


Listening 


B1 47 -0.585 ** -0.760 ** 

B2 34 -0.646 ** -0.796 ** 

C1 47 -0.669 ** -0.764 ** 

B1 50 -0.607 ** -0.772 ** 

B2 47 -0.785 ** -0.894 ** 

C1 46 -0.648 ** -0.802 **

Result 

Variation <strong>of</strong> participants and test items 

<strong>Test</strong> 

Level 

B1 

B2 

C1 

Source 

2 

σ i 

2 

σ p 

2 

σ ip 

2 

σ i 

2 

σ p 

2 

σ ip 

2 

σ i 

2 

σ p 

2 

σ ip 

Reading 

Listening 

1 st round 2 nd round 1 st round 2 nd round 

0.004 24.9% 0.007 50.1% 0.002 22.6% 0.003 41.5% 

0.004 23.1% 0.001 9.2% 0.001 10.1% 0.000 5.9% 

0.008 52.0% 0.006 40.7% 0.007 67.3% 0.004 52.6% 

0.007 43.6% 0.008 56.2% 0.002 22.9% 0.003 38.1% 

0.001 6.3% 0.001 5.7% 0.001 12.2% 0.001 10.3% 

0.008 50.1% 0.005 38.2% 0.006 64.9% 0.005 51.6% 

0.004 32.3% 0.005 43.1% 0.003 30.1% 0.004 41.6% 

0.001 11.7% 0.001 10.9% 0.000 3.0% 0.000 2.9% 

0.007 56.0% 0.005 46.0% 0.007 66.9% 0.005 55.5%

Result 

G-coefficient 


Reading 


Listening 


1 st round 2 nd round 1 st round 2 nd round 

B1 0.811 0.917 0.787 0.897 

B2 0.897 0.936 0.795 0.890 

C1 0.825 0.904 0.832 0.892

Conclusion 


<strong>Cut</strong>-<strong><strong>of</strong>f</strong> score 

B1 66 

B2 64 

C1 62 

restrictions: no male panel members 

 

future study: examine <strong>the</strong> cut-<strong><strong>of</strong>f</strong> scores in 

real tests

Cut-off Scores of the New Chinese Proficiency Test Based on ... - ALTE

Create successful ePaper yourself

Delete template?

Save as template?