27.03.2014 Views

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

TABLE I. CAPTURE-RECAPTURE MODELS AND ESTIMATORS [8, 10, 16, 18, 19, 22]<br />

Model Variation Source Estimators Belonging to Each CR Model<br />

M o<br />

All inspectors have the same ability, and all defects are equally likely of being Unconditional Likelihood (M o-UMLE); Conditional<br />

detected.<br />

Likelihood (M o-CMLE); Estimating Equations (M o-EE)<br />

M t Inspectors differ in their abilities, but all defects are equally likely of being found. M o-UMLE; M o-CMLE; M o-EE<br />

M h Inspectors have the same ability, but defects differ in their probability of being found. Jackknife (M h-JK); Sample Coverage (M h-SC); M h-EE<br />

M th Inspectors differ in their ability, and defects differ in their probability of being found. M h-SC; M h-EE<br />

it can be determined if two people report the same fault).<br />

However, because some faults are easier to find than others and<br />

because inspectors have different fault detection abilities, the<br />

equal capture probability assumption is not met [18].<br />

To accommodate these assumptions, 4 different CR models<br />

are built around the 2 sources of variation: Inspector Capability<br />

and Fault Detection Probability. Table I shows the four CR<br />

models along with their source(s) of variation. Each CR model<br />

in Table I has a set of estimators, which use different statistical<br />

approaches to produce the estimates. The estimators for each<br />

CR model are also shown in Table I. The mathematical details<br />

of CR estimators are beyond the scope of this paper [18, 22].<br />

The input data used by all the CR estimators is organized as a<br />

matrix with rows that represent faults and columns that<br />

represent inspectors. A matrix entry is 1 if the fault is found by<br />

the inspector and 0 otherwise.<br />

CR was introduced to software inspections by Eick, et al.<br />

by applying it to real fault data from AT&T. A major result<br />

from this study was the recommendation that an artifact should<br />

be re-inspected if more than 20% of the total faults remain<br />

undetected [5, 10]. Following this study, various empirical<br />

studies in SE have evaluated the use of CR models to<br />

accurately estimate the total fault count using artifacts with a<br />

known number of seeded defects [18]. In addition, our prior<br />

research evaluated the effect of inspection team size on the<br />

quality of the CR estimates [19]. A common finding across all<br />

the studies is that the estimators underestimate the actual fault<br />

count with small number of inspectors, but improve with more<br />

faults and inspectors. While there is evidence on the ability of<br />

the CR estimators to accurately estimate the total fault count<br />

[16], the CR research has neglected the cost spent and cost<br />

saved by an inspection. This research extends our prior work<br />

by evaluating the cost-effectiveness of software inspections<br />

using the CR estimates on real artifacts. These results will<br />

provide guidance on whether the CR estimators can be used to<br />

evaluate the cost-effectiveness of an inspection process in real<br />

software project where the fault count is unknown beforehand.<br />

III. LITERATURE ON INSPECTION VS. TESTING COST RATIO<br />

As mentioned earlier (section II.A), calculating the cost<br />

saved by the inspection requires a count of faults remaining<br />

post-inspection, and the average cost to detect a fault in the<br />

testing stage. While the CR method can be used to estimate the<br />

remaining faults, the average cost to detect a fault in the testing<br />

stage is not available at the end of inspection cycle. In this<br />

research, the average cost (time spent in hours) to find a fault<br />

during the testing, is calculated as a factor of the average cost<br />

(time spent in hours) to find a fault during the inspection. This<br />

section presents a summary of findings from different software<br />

organizations that reported the data on the cost (staff-hours)<br />

spent to find a fault during the inspections versus testing. This<br />

cost ratio is then used to calculate the cost-savings, the virtual<br />

testing cost, and the M k value of the inspections using (1).<br />

Major results regarding the cost spent (in staff hours) to<br />

find a fault during inspections versus the cost spent to find a<br />

fault during testing show a cost ratio of 1:6 [2, 4, 12, 17, 22].<br />

These values are based on actual reported data. Also, Briand<br />

[6], based on the published data has provided probability<br />

distribution parameters for the average effort using different<br />

fault detection techniques according to which the most likely<br />

value for design inspections are 1.58 hours per fault and, for<br />

testing are 6 hours per fault. These values were derived from<br />

various studies on the cost of finding faults in design, code<br />

reviews and testing. Different studies in the literature give<br />

different estimates because of the differences in study settings,<br />

software processes, severity of the faults, review techniques<br />

and other factors. In order to find the most appropriate value,<br />

we computed the median of the reported cost ratio values<br />

resulting from precise data collection. We did not consider<br />

approximations, estimates, or data whose origins were unclear.<br />

As a result, the median cost ratio is 1:5.93. Therefore, for this<br />

research, the inspection to testing cost ratio of 1:6 was used to<br />

calculate the cost-effectiveness of the inspection process.<br />

IV. STUDY DESIGN<br />

Prior software engineering research has validated the fault<br />

detection effectiveness of inspections at the early stages of<br />

development [1,11]. However, there is a lack on empirical<br />

research on the benefits of inspections in terms of the extent to<br />

which the testing costs can be reduced by performing the<br />

inspections of early software artifacts. Furthermore, while<br />

inspections are effective; they cannot provide insights into the<br />

remaining faults which are required to calculate the fault<br />

rework savings. On that end, the CR method has been<br />

evaluated to provide a reliable estimate of the faults remaining<br />

post-inspection using artifacts with seeded faults [16].<br />

However, there is a lack of research on the CR estimator’s<br />

ability to provide a reliable estimate of the remaining faults<br />

when the actual fault count of the software artifact in not<br />

known beforehand. Therefore, this paper evaluates the ability<br />

of the CR estimator’s to accurately predict the costeffectiveness<br />

of the inspection process using the fault data from<br />

4 real software artifacts that contained naturally occurring<br />

faults. In addition, each artifact was inspected twice, which<br />

allowed the analysis of the CR estimator’s ability to accurately<br />

estimate the cost-effectiveness after each inspection cycle.<br />

A. Research Goal<br />

The main goal of this study is to evaluate the ability of CR<br />

estimators to provide an accurate cost-effectiveness value of an<br />

inspection process by comparing the M k values based on the<br />

CR estimates against the actual M k value after each inspection.<br />

47

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!