27.03.2014 Views

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

adequate to another criterion. This measure is important to<br />

evaluate empirically the inclusion relation among criteria.<br />

During the generation of test cases was observed the<br />

effectiveness of each criteria to reveal the existent faults.<br />

Some programs of the benchmarks Inspect, Helgrind and<br />

Rungta have injected fault. If a test case belonging to an<br />

adequate test set is able to reveal fault, this criterion is considered<br />

effective to reveal this kind of fault.<br />

4 Experimental Study: Results<br />

The analysis and interpretation of the results are made<br />

based on principles of the descriptive statistics and hypothesis<br />

testing. Using the analysis of variance (ANOVA) [9], it<br />

was verified whether it is possible to reject the null hypothesis<br />

based on collected data set and statistical tests. The<br />

descriptive analysis is useful to describe and to show graphically<br />

interesting aspects of the study. There are different<br />

perspectives to evaluate the cost of a testing criterion. In<br />

this study we choose the size of the adequate test set and<br />

the number of required elements. Table 1 presents information<br />

of cost for some programs of the experiment, showing<br />

the size of the adequate test set and the number of feasible<br />

required elements for each testing criterion. The cost<br />

data obtained for all programs of the experiment can suggest<br />

an order to apply these criteria, considering initially<br />

the criteria with minor cost: All-w-nodes (ANW), All-pnodes<br />

(ANP), All-p-uses (APU), All-c-uses (ACU), Allnodes<br />

(AN), All-comm-c-uses (ACCU), All-comm-p-uses<br />

(ACPU), All-sync-uses (ASU), All-s-edges (AES) and Alledges<br />

(AE).<br />

Criteria MMult Lazy01 Jacobi Stateful06 Effectiveness<br />

All-nodes 12/340 2/26 4/260 1/26 75%<br />

All-p-nodes 12/54 1/11 2/31 1/7 50%<br />

All-w-nodes 12/49 1/9 3/27 1/6 50%<br />

All-edges 52/425 5/19 11/149 2/13 100%<br />

All-s-edges 52/328 5/15 11/75 2/8 100%<br />

All-c-uses 12/247 1/3 2/67 1/8 50%<br />

All-p-uses 12/209 - 2/64 1/6 66%<br />

All-sync-uses 52/270 4/9 - 2/6 75%<br />

All-comm-c-uses 24/320 2/2 6/100 1/1 75%<br />

All-comm-p-uses - 4/6 7/61 - 100%<br />

effectiveness =<br />

number of faults found<br />

∗ 100<br />

number of faults injected<br />

In this analysis was considered the programs with injected<br />

faults, totaling 23 programs. The results are presented<br />

in Table 1. The criteria ACPU, AE and AES are the<br />

most effective to reveal the faults, however there are some<br />

kind of faults that are revealed for few testing criteria and,<br />

in some cases, testing criteria with less effectiveness are<br />

responsible to identify specific kind of faults. The results<br />

of the ANOVA analysis suggest that is possible reject the<br />

null hypothesis (NH2) and accept the alternative hypothesis<br />

(AH2), because the p-value obtained is 0.007, less than<br />

0.05. These results indicate that there is a considerable difference<br />

of effectiveness among the testing criteria.<br />

The equation below is used to calculate of strength of a<br />

criterion C 1 :<br />

strength C1 =<br />

Number of elements covered by T C2<br />

T otal of required elements − infeasible elements<br />

Data analysis related to the strength was performed using<br />

the statistical method of cluster analysis in order to identify<br />

if there is an inclusion relation among the criteria. Dendrogram<br />

charts are used to illustrate the results about inclusion<br />

relation among criteria. If a criterion C1 includes another<br />

criteria C2 by applying the adequate test set T C1 , these criteria<br />

are at same level in the graph. Analysing the Dendogram<br />

1 we can conclude that the ACCU criterion includes<br />

the ANW criterion because both are at the same level in<br />

the graph. Based on this result the null hypothesis (NH3)<br />

can be rejected and the alternative hypothesis (AH3) is accepted,<br />

indicating that the criteria can be complementary.<br />

Table 1. Costs and effectiveness for some<br />

programs in the experiment.<br />

The test set size was used to evaluate the cost and to perform<br />

the hypothesis testing. Based on the ANOVA analysis,<br />

the null hypothesis (NH1) is rejected because the p-value<br />

obtained for a pair of different criteria was 0.0009, less than<br />

the significance level of 0.05. This result suggests that there<br />

exist differences among the costs of these testing criteria. In<br />

this case, the alternative hypothesis (AH1) is accepted.<br />

The effectiveness is calculated by the following equation:<br />

Figure 1. Dendrogram to All-comm-c-uses criterion.<br />

The cluster analysis for the remaining criteria indicates<br />

478

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!