20.10.2015 Views

A COMPENDIUM OF SCALES for use in the SCHOLARSHIP OF TEACHING AND LEARNING

compscalesstl

compscalesstl

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

whe<strong>the</strong>r reports of reliability are adequate. For example, most researchers <strong>use</strong> a cutoff <strong>for</strong><br />

alpha at .70 or higher as adequate (Hogan et al., 2000; Schmitt, 1996); however, o<strong>the</strong>rs note .80<br />

or higher to be <strong>the</strong> recommended levels <strong>for</strong> alpha (Henson, 2001; Lance, Butts, Michels, 2006).<br />

However, higher is not always better as an overly high alpha might be <strong>the</strong> result of redundancy<br />

<strong>in</strong> <strong>the</strong> items (Ryan & Wilson, 2014; Stre<strong>in</strong>er, 2003; Tavakol & Dennick, 2011). Ryan and Wilson’s<br />

(2014) brief version of <strong>the</strong> Professor-Student Rapport Scale with a Cronbach’s alpha of .83 is a<br />

good example of acceptable <strong>in</strong>ternal consistency that is not so high as to suggest issues with<br />

redundancy.<br />

Some of <strong>the</strong> most common methods of assess<strong>in</strong>g reliability <strong>in</strong>clude <strong>in</strong>ter-observer agreement,<br />

test-retest reliability, and measures of <strong>in</strong>ternal consistency. What follows is a brief discussion of<br />

<strong>the</strong>se common ways of assess<strong>in</strong>g reliability as well as additional resources faculty can exam<strong>in</strong>e<br />

to learn more about <strong>the</strong>se topics. It is important to keep <strong>in</strong> m<strong>in</strong>d that this discussion is a brief<br />

<strong>in</strong>troduction <strong>in</strong>to concepts related to reliability; entire books are devoted to this topic (e.g., see<br />

F<strong>in</strong>k & Litw<strong>in</strong>, 1995; Litw<strong>in</strong>, 2003).<br />

Inter-Observer<br />

Assess<strong>in</strong>g <strong>in</strong>ter-observer agreement comes <strong>in</strong>to play when researchers want to look at <strong>the</strong><br />

reliability of responses given by two or more raters or judges (Warner, 2013). Depend<strong>in</strong>g on <strong>the</strong><br />

goal of <strong>the</strong> researcher, <strong>the</strong> level of agreement between raters could be allowed to vary or not.<br />

For example, <strong>in</strong> a psychology statistics class, two teachers should be <strong>in</strong> perfect agreement<br />

regard<strong>in</strong>g whe<strong>the</strong>r a student correctly computed a specific statistical analysis. In contrast, a<br />

group of teach<strong>in</strong>g assistants might be asked to come <strong>in</strong>to a classroom and judge students’<br />

poster projects <strong>for</strong> <strong>the</strong> best designed poster to exam<strong>in</strong>e if a certa<strong>in</strong> teach<strong>in</strong>g <strong>in</strong>tervention was<br />

effective. In this case, perfect score agreement might not be needed or beneficial. Instead, we<br />

might look at how <strong>the</strong> judges ranked <strong>the</strong> posters from best to worst. Researchers can also<br />

exam<strong>in</strong>e percentage of agreement between raters by count<strong>in</strong>g <strong>the</strong> number of times <strong>the</strong> raters<br />

were <strong>in</strong> agreement and divid<strong>in</strong>g by <strong>the</strong> total number of judgments made. However, this method<br />

of assessment does not take <strong>in</strong>to account chance levels of agreement between raters and tends<br />

to work best when <strong>the</strong> rated variable is objective ra<strong>the</strong>r than subjective. To take <strong>in</strong>to account<br />

chance agreements, one would need to calculate Cohen’s kappa (ks coefficient (Viera &<br />

Garrett, 2005), with coefficients below .40 be<strong>in</strong>g poor, between .40 and .59 fair, .60 and .74<br />

good, and.75 and 1.00 excellent (Cicchetti, 1994). Sources of measurement error to be m<strong>in</strong>dful<br />

of <strong>in</strong>clude judges’ background and tra<strong>in</strong><strong>in</strong>g regard<strong>in</strong>g <strong>the</strong> variable of <strong>in</strong>terest and <strong>the</strong> tool be<strong>in</strong>g<br />

<strong>use</strong>d as well as prevalence of <strong>the</strong> f<strong>in</strong>d<strong>in</strong>g (DeVillis, 2012; Guggenmoos-Holzmann, 1996; Viera &<br />

Garrett, 2005).<br />

Test-Retest<br />

Reliable <strong>in</strong>struments should provide constant scores if <strong>use</strong>d to assess a variable at one time<br />

po<strong>in</strong>t and <strong>the</strong>n aga<strong>in</strong> at ano<strong>the</strong>r time po<strong>in</strong>t (so long as <strong>the</strong>oretically that variable should rema<strong>in</strong><br />

constant across time); however, <strong>the</strong> error of measurement each time will vary. A correlation<br />

could <strong>the</strong>n be computed to assess consistency <strong>in</strong> scores <strong>for</strong> <strong>the</strong> two assessment po<strong>in</strong>ts.<br />

36

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!