12.01.2013 Views

Source Code Analysis Laboratory (SCALe) for Energy ... - CERT

Source Code Analysis Laboratory (SCALe) for Energy ... - CERT

Source Code Analysis Laboratory (SCALe) for Energy ... - CERT

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

• individual consistency (<strong>for</strong> example, how consistent is the individual in rendering the same<br />

judgment across time <strong>for</strong> the same or virtually the same flagged noncon<strong>for</strong>mity; often referred<br />

to as repeatability)<br />

• group accuracy (<strong>for</strong> example, what percentage of the time does a specific group of <strong>SCALe</strong><br />

analysts render the correct judgment)<br />

• group consistency (<strong>for</strong> example, what percentage of the time does a specific group of <strong>SCALe</strong><br />

analysts render the same judgment <strong>for</strong> a given flagged noncon<strong>for</strong>mity across time; often referred<br />

to as reproducibility)<br />

Any modern statistical package can easily determine the attribute agreement measures, which are<br />

interpreted as follows [Landis 1977]:<br />

• Less than 0 (no agreement)<br />

• 0–0.20 (slight agreement)<br />

• 0.21–0.40 (fair agreement)<br />

• 0.41–0.60 (moderate agreement)<br />

• 0.61–0.80 (substantial agreement)<br />

• 0.81–1 (almost perfect agreement)<br />

A need may arise to assess both accuracy and consistency of <strong>SCALe</strong> analysts’ judgments with<br />

measures that extend beyond the binary situation (correct or incorrect) to situations in which a<br />

judgment is a gradual measure of closeness to the right answer. In this case, analysts should use<br />

an alternative attribute agreement measure and interpret the output quite similarly to the Kappa<br />

coefficient, with results possible on the dimensions listed above. As such, Kendall coefficients<br />

serve well <strong>for</strong> judgments on an ordinal scale, in which incorrect answers are closer or farther away<br />

from the correct answer. A hypothetical example of a judgment on an ordinal scale would be if a<br />

<strong>SCALe</strong> analyst were asked to render judgment of the severity of a flagged noncon<strong>for</strong>mity, say on<br />

a 10-point scale. If the true severity is, <strong>for</strong> example, 8, and two <strong>SCALe</strong> analysts provided answers<br />

of 1 and 7, respectively, then a severity judgment of 7 would have a much higher Kendall coefficient<br />

than the severity judgment of 1.<br />

In conclusion, attribute agreement analysis may be conducted via small exercises with <strong>SCALe</strong><br />

analysts rendering judgments on a reasonably-sized list of different types of flagged noncon<strong>for</strong>mities<br />

mapped to the set of rules within the scope of a given code con<strong>for</strong>mance test.<br />

2.7.2.2 Formative Evaluation Assessment Using Attribute Agreement <strong>Analysis</strong><br />

<strong>SCALe</strong> analysts participate in a <strong>for</strong>mative evaluation assessment as part of their training and certification.<br />

Certification of a candidate as a <strong>SCALe</strong> analyst requires attribute agreement scores of<br />

80 percent or higher. In addition, acceptable thresholds <strong>for</strong> accuracy may be imposed separately<br />

<strong>for</strong> each rule.<br />

The <strong>for</strong>mative evaluation assessment implements a simple attribute agreement analysis as follows.<br />

First, the <strong>SCALe</strong> assessment administrator identifies a preliminary set of 20 to 35 different<br />

flagged noncon<strong>for</strong>mities (from the diagnostic output of a software system or code base) <strong>for</strong> the<br />

evaluation assessment. The administrator ensures that the preliminary set includes a variety of<br />

CMU/SEI-2010-TR-021 | 22

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!