14.11.2014 Views

Teacher – Full Teacher Evaluation Information - North Dakota ...

Teacher – Full Teacher Evaluation Information - North Dakota ...

Teacher – Full Teacher Evaluation Information - North Dakota ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

quality of every classroom or a global estimate of the quality of a given program or locality? Are there<br />

high stakes for CLASS scores? How will you use your data to improve teacher-child interactions on a<br />

classroom-by-classroom basis? Answers to these questions should guide your ultimate data collection<br />

plan.<br />

Do we need to send more than one observer to each classroom? One of the<br />

best ways to improve the reliability of CLASS scores is to have multiple observers make ratings of the<br />

same classroom. Although the associated expense of “double coding” often is prohibitive, we still<br />

recommend that at least a portion (between 5 to 15%) of classroom observations be double coded to<br />

assess reliability. Having this data will help you communicate to stakeholders about the fairness of the<br />

tool in practice. For example, Virginia has reported obtaining approximately 90% reliability (scores<br />

within one point) in the field.<br />

How do we decide how many classrooms to observe and how long each<br />

should be observed? The answer to this question depends greatly on the goals you have for<br />

data collection. For example, just as survey researchers figure out how best to collect a random<br />

sampling of data on which to base judgments on election polling numbers, complex analyses exist<br />

that can help you develop a sampling plan to match your goals. While it is beyond the scope of this<br />

overview document to provide a detailed answer, general guidance and a few examples are provided<br />

below. They highlight trade-offs that have to be considered when making decisions about the number<br />

of classrooms to be observed and for what length of time.<br />

General Principles to Consider:<br />

1. The more ratings you are able to obtain and aggregate, the more stable your estimates of typical<br />

classroom interactions will be.<br />

2. In most cases, we find that a two-hour observation (4 CLASS cycles) provides a reliable estimate<br />

of the overall status of teacher-child interactions in a classroom.<br />

3. There typically is more variance in CLASS scores within an organization (program, school,<br />

grantee, etc) than there is between organizations. This means you have to assess a fair number of<br />

classrooms within any one organization to get a reliable estimate of that organization.<br />

4. Even if all observers are CLASS certified, there will be small, systematic differences between their<br />

scoring. Some observers may tend to give slightly higher scores, while others may tend to be<br />

slightly more critical. Although slight differences fall within our threshold for “reliability,” collectively,<br />

they can produce skewed results. The best way to minimize any potential “observer effects” is to<br />

randomly assign observers to classrooms within any organization (program, school, grantee, etc.).<br />

As an example of the trade-offs embedded in these decisions, consider two examples <strong>–</strong> the Office of<br />

Head Start (OHS) and the City of Chicago.<br />

OHS conducts triennial reviews of all Head Start grantees. Each grantee typically oversees multiple<br />

Head Start programs. Monitoring visits are designed to provide feedback at the grantee level. OHS<br />

was interested in including the CLASS as part of this review process. OHS did not intend to share or<br />

analyze data at the program or classroom level.<br />

CLASS Implementation Guide 39

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!